Structural variants (SVs) are large changes in DNA — deletions, duplications, inversions, insertions of mobile elements, and translocations — that are at least 50 base pairs in size. They are a major source of genetic variation between individuals and can affect gene dosage, disrupt coding sequence, or rearrange regulatory elements. Because SVs are harder to detect than small variants, population-scale SV maps lag behind single-nucleotide variant resources.
This track displays site-frequency data for 737,998 SVs identified in 17,795 deeply sequenced human genomes (mean coverage > 20×) by Illumina short-read sequencing by Abel et al., Nature 2020. The samples were sequenced by the four sequencing centers of the NHGRI Centers for Common Disease Genomics (CCDG) program, supplemented with ancestrally diverse samples from the PAGE consortium and the Simons Genome Diversity Project. The composition includes roughly 24% African, 16% Latino, 11% Finnish, 39% non-Finnish European, and 9% other ancestries.
Two non-overlapping public callsets are combined into this track. The upstream release contains 738,624 unique primary SV records across the two callsets; 626 B37 records did not lift over to GRCh38, leaving the 737,998 shown here:
Important: the B38 and B37 callsets share 5,245 samples. When inspecting a variant present in both callsets, users should not simply sum the allele counts; the AC/AN reported for each callset reflects that callset's sample set. The callset filter can be used to restrict display to one source.
Items are colored by SV type:
Deletions, duplications, inversions, and mobile-element variants are drawn as intervals spanning from the variant start to its end. Breakend (BND) records are drawn as single-base items at the variant breakpoint; the mate chromosome and position are shown on the details page for each BND. Each BND pair from LUMPY is shown only once (the secondary mate record is suppressed).
The following filters are available from the track configuration page:
Per-population allele counts and numbers are shown on the details page for 8 ancestry groups: AFR (African), AMR (Latino/Admixed-American), NFE (non-Finnish European), FE (Finnish European), EAS (East-Asian), SAS (South-Asian), PI (Pacific Islander), and Other.
Abel et al. 2020 jointly called SVs from Illumina short-read sequencing (mean coverage >20x) of 17,795 genomes from the NHGRI Centers for Common Disease Genomics program with per-sample calls from LUMPY v0.2.13, CNVnator v0.3.3 and svtyper v0.1.4, integrated across the cohort by the svtools pipeline. Low- and high-confidence variants were separated by a Mendelian-error cutoff on mean sample quality, calibrated against 409 CEPH trios, and per-sample calls were validated against a PacBio long-read truth set from three HGSVC samples. Two non-overlapping callsets were released: 458,106 SVs from 14,623 samples called natively on GRCh38 (B38) and 279,892 SVs from 8,417 samples called on GRCh37 (B37). The site-frequency callsets span DELs, DUPs, INVs, mobile-element variants and breakends/translocations.
The B38 and B37 site-frequency VCFs (plus BEDPE companion files) were downloaded from the authors' supplementary-data GitHub repository, github.com/hall-lab/sv_paper_042020. For the hg38 track, INFO fields were parsed into BED9+ columns; B37 records were lifted to hg38 with the UCSC hg19ToHg38.over.chain.gz chain (626 B37 records failed to lift, leaving 737,998 SVs total in the track).
The step-by-step build commands (download, liftOver, format conversion, bigBed build) are recorded in the UCSC makeDoc for this track: doc/hg38/abelSv.txt. The conversion scripts and autoSql schemas live in makeDb/scripts/lrSv.
The data can be explored interactively in table format with the Table Browser or the Data Integrator and exported from there to spreadsheet or tab-sep tables. From scripts, the data can be accessed through our API, track=abelSv.
For automated download and analysis, the annotation is stored in a bigBed file that can be downloaded from our download server. The file for this track is called abelSv.bb. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed, which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain features within a given range, e.g. bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/abelSv/abelSv.bb -chrom=chr21 -start=0 -end=100000000 stdout
The original site-frequency VCF and BEDPE files are distributed by the authors from their supplementary-data GitHub repository.
Thanks to Haley J. Abel, David E. Larson, Ira M. Hall and colleagues at the McDonnell Genome Institute (Washington University in St. Louis), the Broad Institute, Baylor College of Medicine, the New York Genome Center, and the University of Washington for producing this resource and making the site-frequency callsets publicly available.
Abel HJ, Larson DE, Regier AA, Chiang C, Das I, Kanchi KL, Layer RM, Neale BM, Salerno WJ, Reeves C et al. Mapping and characterization of structural variation in 17,795 human genomes. Nature. 2020 Jul;583(7814):83-89. PMID: 32460305; PMC: PMC7547914