Description

This track shows structural variants (SVs) derived from the Human Pangenome Reference Consortium (HPRC) release-2 pangenome graph. The graph was built with minigraph-cactus from PacBio HiFi haplotype-resolved assemblies of 233 samples (including T2T-CHM13 and the diverse 1000 Genomes Project sample set). HPRC releases one VCF per reference path (GRCh38 and T2T-CHM13); we display both natively on the corresponding UCSC assembly (hg38 and hs1). Variants were extracted from the graph with vg deconstruct and decomposed into atomic alleles with vcfwave (WFA2-lib).

The hg38 track contains 1,483,114 SV-sized alleles (length ≥ 50 bp) split by type: 1,106,190 insertions, 192,597 deletions, 178,178 complex alleles and 6,149 inversions. The hs1 track is built from the parallel T2T-CHM13 wave VCF. Each row carries the allele count, allele frequency, number of samples with data and the snarl-nesting level of the variant in the pangenome decomposition tree.

Display Conventions and Configuration

Items are colored by SV type:

Insertions are placed at the insertion site with a width of 1 bp; deletions, complex alleles and inversions span the affected reference interval. Filters are available for SV type, SV length, allele frequency and snarl level (0 = top-level bubble; higher values are nested within parent bubbles).

Methods

HPRC release-2 is an open data release (not yet accompanied by a formal peer-reviewed publication) built from PacBio HiFi haplotype-resolved assemblies of 233 samples, including T2T-CHM13 and a diverse 1000 Genomes Project panel. The pangenome graph was built with Minigraph-Cactus against both GRCh38 and T2T-CHM13 reference paths; variants were extracted from the graph with vg deconstruct and then decomposed into atomic alleles with vcfwave / WFA2-lib, yielding per-allele TYPE and LEN fields. For this track, each ALT in the wave VCF was emitted as its own BED row, retaining alleles with |LEN| ≥ 50 bp or the INV flag; allele counts, frequencies, sample counts and snarl levels are taken directly from the per-allele INFO fields. On hg38 this yields 1,483,114 SV-sized alleles (1,106,190 insertions, 192,597 deletions, 178,178 complex alleles and 6,149 inversions); the hs1 track is built from the parallel T2T-CHM13 wave VCF. Sample-list and assembly provenance for the graph are maintained at HPRC in hprc_intermediate_assembly/alignments_v2.0.csv.

The HPRC v2.0 Minigraph-Cactus graph and wave-decomposed VCFs were downloaded from the HPRC S3 release bucket: hprc-v2.0-mc-grch38.wave.vcf.gz (hg38) and hprc-v2.0-mc-chm13.wave.vcf.gz (hs1).

The step-by-step build commands (download, format conversion, bigBed build) are recorded in the UCSC makeDoc for this track container: doc/hg38/lrSv.txt and doc/hs1/lrSv.txt. The conversion scripts and autoSql schemas live in makeDb/scripts/lrSv.

Data Access

The data can be explored interactively in table format with the Table Browser or the Data Integrator, and accessed programmatically through our API, track=hprc2Sv.

The bigBed is available from our download server for both assemblies:

Example: bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/hprc2.bb -chrom=chr21 -start=0 -end=100000000 stdout.

The original pangenome graph and the wave-decomposed VCF are available from the HPRC public S3 bucket, as linked from the HPRC release-2 announcement.

Credits

Thanks to the Human Pangenome Reference Consortium for building and publicly releasing the release-2 minigraph-cactus pangenome.

References

HPRC release-2 data is not yet described in a formal peer-reviewed publication. See the Human Pangenome Project release announcement for background and data-access details: HPRC data release 2.