This track shows structural variants (SVs) from the Consortium of Long-Read Sequencing database (CoLoRSdb). The sequencing data was contributed by various labs and research groups across the world using PacBio HIFI sequencing, and amount to 1,427 individuals in total. All were sequencing with PacBIO HIFI. The track contains 426,239 SVs: 232,973 insertions, 192,534 deletions and 732 inversions, with per-site allele frequencies, genotype counts and Hardy-Weinberg statistics across the cohort.
Note that CoLoRSdb also published short variants, in the Genome Browser, these can be found in the Variants Frequencies track.
Items are colored by SV type:
Insertions are placed at the insertion site; deletions and inversions span the affected reference interval. Filters are available for SV type, SV length and alternate allele count. Mousing over an item shows the SV type, length, allele frequency, allele counts (homozygous / heterozygous / hemizygous) and the number of carrier samples.
The detail page additionally shows the total allele number (AN), the Hardy-Weinberg equilibrium p-value (HWE), the excess-heterozygosity p-value (ExcHet) and the REF / ALT allele sequences.
SVs were called on each sample's long-read alignments with pbsv and then merged across the CoLoRSdb cohort with Jasmine to produce a site-level joint callset. Per-site allele counts, allele frequencies, HWE and ExcHet p-values were computed from the joint VCF. The VCF was converted to a bigBed for display in the Genome Browser.
The data can be explored interactively in table format with the Table Browser or the Data Integrator, and accessed programmatically through our API, track=colorsDbSv.
The bigBed is available from our download server as sv.hg38.bb. Example: bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/colorsDb/sv.hg38.bb -chrom=chr21 -start=0 -end=100000000 stdout.
The original VCF files and full release documentation are available from the CoLoRSdb v1.2.0 dataset on Zenodo: zenodo.org/records/14814308.
Thanks to Mike Schatz, Evan Eichler, and all CoLoRSdb investigators for generating and making the data publicly available.
Lake, J. A., & Consortium of Long-Read Sequencing (CoLoRS). Consortium of Long-Read Sequencing Database (CoLoRSdb) (v1.2.0) [Data set]. Zenodo. 2025 Feb 5. DOI: 10.5281/zenodo.14814308
Kirsche M, Prabhu G, Sherman R, Ni B, Battle A, Aganezov S, Schatz MC. Jasmine and Iris: population-scale structural variant comparison and analysis. Nat Methods. 2023 Mar;20(3):408-417. PMID: 36658279; PMC: PMC10006329
Eisfeldt J, Ameur A, Lenner F, Ten Berk de Boer E, Ek M, Wincent J, Vaz R, Ottosson J, Jonson T, Ivarsson S et al. A national long-read sequencing study on chromosomal rearrangements uncovers hidden complexities. Genome Res. 2024 Nov 20;34(11):1774-1784. PMID: 39472022; PMC: PMC11610602