This track displays structural variants (SVs) — deletions, insertions, and complex substitutions of at least 50 bp — from the Arabic Pangenome Reference (APR), a pangenome graph built from 53 UAE-resident Arab individuals drawn from eight countries (UAE, Saudi Arabia, Oman, Jordan, Egypt, Morocco, Syria, Yemen). Each bubble in the graph that contains an SV-sized alternative allele is shown as a single variant site, with allele counts aggregated across the 53 samples (the GRCh38 reference haplotype, present as an extra sample column in the source VCF, is excluded from the aggregation).
The APR pangenome was built on the T2T-CHM13v2 reference. Variants are shown natively on the hs1 browser and lifted to hg38 using the UCSC hs1ToHg38.over.chain.gz chain; variants that do not lift cleanly (often in T2T-added euchromatic sequence) are omitted from the hg38 version of the track.
Items are colored by SV type:
Each item spans from the start of REF to its end on the reference. The name field is the graph snarl ID (e.g. <951452<1012008), which identifies the variant site in the APR pangenome graph.
The source VCF is multi-allelic: a single graph snarl appears as one row with a comma-separated ALT list. For this track, each ALT is classified individually using the 50 bp threshold, and the row is emitted as a single bed item with:
Rows whose alts are all smaller than 50 bp are not shown.
Nassir et al. 2025 built the Arabic Pangenome Reference (APR) from 53 UAE-resident Arab individuals drawn from eight countries, sequenced with ~35x PacBio HiFi on Sequel IIe/Revio (30-h movies), ~54x Oxford Nanopore ultralong reads on R10.4.1 PromethION flow cells (96-h runs), and ~65x Hi-C (Illumina NovaSeq 6000). Haplotype-phased de novo assemblies were produced with hifiasm v0.19.5 (primary) and Verkko v1.3.1 (for comparison), with a median N50 of 124 Mb. The pangenome graph was built with Minigraph-Cactus seeded on T2T-CHM13v2 and augmented with GRCh38, and SVs were extracted by graph deconstruction. The released decomposed VCF (apr_review_v1_2902_chm13.vcf.gz) contains ~21 million variants on CHM13v2 contigs; after filtering to alt alleles with ≥50 bp length difference and collapsing the alts of each snarl into a single site, the APR SV track is obtained. Variants are shown natively on hs1 and lifted to hg38 with the UCSC hs1ToHg38.over.chain.gz chain (variants not lifting cleanly are omitted from the hg38 version).
The source APR VCF was downloaded from the Mohammed Bin Rashid University SharePoint page, mbru.ac.ae/the-arab-pangenome-reference; the accompanying project source code is at github.com/muddinmbru/arab_pangenome_reference.
The step-by-step build commands (download, graph-VCF conversion, liftOver, bigBed build) are recorded in the UCSC makeDoc for this track container: doc/hg38/lrSv.txt. The conversion scripts and autoSql schemas live in makeDb/scripts/lrSv.
The data can be explored interactively with the Table Browser or Data Integrator, and accessed from scripts via our API (track=aprSv).
For automated download, the bigBed files are at http://hgdownload.soe.ucsc.edu/gbdb/hs1/lrSv/apr.bb (native) and http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/apr.bb (lifted).
The original APR pangenome VCF and assemblies can be downloaded from https://www.mbru.ac.ae/the-arab-pangenome-reference/, and the project source code is at https://github.com/muddinmbru/arab_pangenome_reference.
Thanks to the Arabic Pangenome Reference team at Mohammed Bin Rashid University (Dubai), led by Mohammed Uddin, for producing and releasing the pangenome and its variant calls.
Nassir N, Almarri MA, Kumail M, Mohamed N, Balan B, Hanif S, AlObathani M, Jamalalail B, Elsokary H, Kondaramage D et al. A draft UAE-based Arab pangenome reference. Nat Commun. 2025 Jul 24;16(1):6747. PMID: 40707445; PMC: PMC12290100