Description

This track shows structural variants (SVs) from Oxford Nanopore long-read whole-genome sequencing of 100 individuals in the 1000 Genomes Project, as released by the 1000 Genomes Project ONT Sequencing Consortium and described in Gustafson et al. 2024. The cohort spans all five 1000 Genomes superpopulations and 19 subpopulations. Samples were sequenced with ONT R9.4.1 pores at ~37x coverage with median read N50 of ~54 kb.

The track contains 113,696 SVs (63,177 insertions, 49,704 deletions, 744 inversions, 71 duplications). Each variant was called by up to five independent methods (three alignment-based: Sniffles2, cuteSV, SVIM; and assembly-based hapdiff on Flye or Shasta/Hapdup assemblies) and then merged across callers and samples with Jasmine to produce a cross-sample consensus catalog.

This 100-sample Gustafson cohort is distinct from the Vienna 1000-Genomes-ONT release (1KG ONT SVs), which uses different samples, pore chemistry and callers; the two releases share neither samples nor calls.

Display Conventions and Configuration

Items are colored by SV type:

Insertions are placed at the insertion site with a width of 1 bp; deletions, duplications and inversions span the affected reference interval. Filters are available for SV type, SV length and carrier-sample count. The detail page also shows the number of per-caller calls supporting each site (VARCALLS) and whether the source caller marked the breakpoints as precise.

Methods

Gustafson et al. 2024 performed Oxford Nanopore long-read sequencing on 100 samples from the 1000 Genomes Project (all five superpopulations and 19 subpopulations) using R9.4.1 flow cells, at a median per-sample coverage of ~37x and read N50 of ~54 kb. Per-sample SV calls were generated through the Napu pipeline with five independent methods: three alignment-based callers (Sniffles2, cuteSV and SVIM run on minimap2 alignments to GRCh38) and two assembly-based callers (hapdiff run on Flye and on Shasta/Hapdup assemblies). The five per-sample VCFs were merged with Jasmine in two stages (intra-sample consensus, then cross-sample merge). The released confident site-level callset is defined as variants supported by hapdiff and at least two unique alignment-based callers, yielding 113,696 SVs (63,177 insertions, 49,704 deletions, 744 inversions, 71 duplications). SV counts per sample and multicaller concordance were benchmarked against the HPRC Sniffles2 truth and the GIAB HG002 Tier1 region with Truvari v4.1.0.

The source Jasmine-merged VCF was downloaded from the 1000 Genomes ONT S3 bucket: 20240423_jasmine_intrasample_noBND_custom_suppvec_alphanumeric_header_JASMINE.vcf.gz.

The step-by-step build commands (download, format conversion, bigBed build) are recorded in the UCSC makeDoc for this track container: doc/hg38/lrSv.txt. The conversion scripts and autoSql schemas live in makeDb/scripts/lrSv.

Data Access

The data can be explored interactively in table format with the Table Browser or the Data Integrator, and accessed programmatically through our API, track=gustafsonSv.

The bigBed is available from our download server as gustafson.bb. Example: bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/gustafson.bb -chrom=chr21 -start=0 -end=100000000 stdout.

The original VCF is available from the 1000 Genomes ONT S3 bucket: 20240423_jasmine_intrasample_noBND_custom_suppvec_alphanumeric_header_JASMINE.vcf.gz.

Credits

Thanks to Gustafson and colleagues and the 1000 Genomes Project ONT Sequencing Consortium for releasing this dataset.

References

Gustafson JA, Gibson SB, Damaraju N, Zalusky MPG, Hoekzema K, Twesigomwe D, Yang L, Snead AA, Richmond PA, De Coster W et al. High-coverage nanopore sequencing of samples from the 1000 Genomes Project to build a comprehensive catalog of human genetic variation. Genome Res. 2024 Nov 20;34(11):2061-2073. PMID: 39358015; PMC: PMC11610458