Description

This track shows structural variants (SVs) from the second phase of the Human Genome Structural Variation Consortium (HGSVC2). The callset is derived from 32 haplotype-resolved diploid genomes (64 phased haplotypes) spanning five 1000 Genomes superpopulations (African, Admixed American, East Asian, European, South Asian). Each genome was sequenced with PacBio long reads (continuous long-read and HiFi) and phased with Strand-seq, enabling comprehensive characterization of SVs that short-read approaches miss.

The track merges the two SV annotation tables from the HGSVC2 v2.0 integrated callset freeze 4: 111,330 insertions/deletions and 416 inversions, for a total of 111,746 SVs. Each row is a site-level variant with per-site allele count, carrier haplotypes, population-scale allele frequencies (imputed from the phased callset back into 1000 Genomes, insertions and deletions only) and structural annotations.

Display Conventions and Configuration

Items are colored by SV type:

Insertions are placed at the insertion site with a width of 1 bp; deletions and inversions span the affected reference interval. Filters are available for SV type, SV length, carrier-haplotype count, distinct sample count, whether the site falls in a Tandem Repeat Finder region and the fraction of the variant overlapping segmental duplications.

The detail page shows, where available:

Methods

Ebert et al. 2021 produced phased haplotype-resolved de novo assemblies for 32 diploid samples (64 unrelated haplotypes) across five 1000 Genomes superpopulations on the PacBio Sequel II platform, using continuous long-read sequencing (CLR, >40x) and high-fidelity sequencing (HiFi, >20x). Single-cell Strand-seq data from the same samples were used to phase the assemblies without parental trios, yielding N50 contigs >25 Mbp at QV > 40. SVs were discovered from the two haplotype assemblies of each sample with the Phased Assembly Variant (PAV) caller against GRCh38, and candidate SVs were orthogonally supported by at least one of seven other sources (read-based callers MELT, PBSV and PALMER; Bionano optical mapping; breakpoint k-mer analysis; PAV replication with LRA). This yielded the integrated nonredundant callset of 107,590 insertion/deletion SVs and 316 inversions. Population-scale allele frequencies (POP_*_AF) were obtained by graph-based re-genotyping of the HGSVC2 SVs into the 3,202-sample 1000 Genomes short-read cohort with PanGenie (insertions and deletions only).

For display, the HGSVC2 v2.0 freeze-4 annotation tables variants_freeze4_sv_insdel.tsv.gz (111,330 DEL+INS) and variants_freeze4_sv_inv.tsv.gz (416 INV) were downloaded from the IGSR HGSVC2 v2.0 integrated-callset directory and merged into a single bigBed; type-specific columns (POP_*_AF for insdel, RGN_REF_INNER for inversions) are empty on the detail page when they do not apply.

The step-by-step build commands (download, format conversion, bigBed build) are recorded in the UCSC makeDoc for this track container: doc/hg38/lrSv.txt. The conversion scripts and autoSql schemas live in makeDb/scripts/lrSv.

Data Access

The data can be explored interactively in table format with the Table Browser or the Data Integrator, and accessed programmatically through our API, track=hgsvc2Sv.

The bigBed is available from our download server as hgsvc2.bb. Example: bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/hgsvc2.bb -chrom=chr21 -start=0 -end=100000000 stdout.

The original annotation tables and VCFs are available from the HGSVC2 v2.0 integrated callset on the IGSR FTP site.

Credits

Thanks to the Human Genome Structural Variation Consortium (HGSVC) and the 1000 Genomes Project for releasing this dataset. Later HGSVC releases are also available as UCSC tracks: HGSVC3 65 SVs.

References

Ebert P, Audano PA, Zhu Q, Rodriguez-Martin B, Porubsky D, Bonder MJ, Sulovari A, Ebler J, Zhou W, Serra Mari R et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science. 2021 Apr 2;372(6537). PMID: 33632895; PMC: PMC8026704