Description

Preliminary data. This callset is a pre-publication release that will be updated before the final publication. Before using these data for analysis or in a paper, please contact the authors at the Aligning Science Across Parkinson's (ASAP) consortium / the Kim lab to check for the latest version and for guidance on appropriate use.

This track shows structural variants (SVs) identified by PacBio HiFi long-read whole-genome sequencing of 100 post-mortem human brain samples, split across three diagnostic groups: Parkinson's disease (PD), incidental Lewy body disease (ILBD) and healthy controls (HC). The high-confidence catalog contains 74,552 SVs: 34,056 insertions, 29,545 deletions, 9,707 duplications and 1,244 inversions.

The dataset accompanies Kim et al. (2026), which combines the long-read SV catalog with single-nucleus RNA-seq from the same donors to identify SVs associated with cell-type-specific gene expression, including variants near genes nominated as causal targets of PD GWAS loci.

Display Conventions and Configuration

Items are colored by SV type:

Insertions are placed at the insertion site with a width of 1 bp; deletions, duplications and inversions span the affected reference interval. Filters are available for SV type, SV length, variant quality and allele frequencies in each of the three cohorts (PD, HC, ILBD), as well as the case-minus-control carrier-rate differential.

The detail page shows, for each variant:

Methods

Kim et al. 2026 performed PacBio HiFi long-read whole-genome sequencing on 100 post-mortem cerebellum samples from the Arizona Study of Aging and Neurodegenerative Disorders / Brain and Body Donation Program cohort (35 Parkinson's disease, 31 incidental Lewy body disease, 34 healthy controls). gDNA was isolated with either the Qiagen DNeasy or PacBio Nanobind PanDNA kit, sheared on a Megaruptor 3 to 10-23.5 kb, built into SMRTbell libraries (Prep Kit 3.0) and sequenced on PacBio Revio (25M SMRT cells, 2-h pre-extension, 24-h movies) to ~17x per-sample coverage. Reads were processed with the Broad long-read WDL pipelines (CCS v6.2.0, pbmm2 v1.4.0 aligned to GRCh38, SAMtools v1.13 merge/sort) and an ensemble of three callers was run per sample: Sniffles2 v2.0.6, PBSV v2.9.0 (with GRCh38 tandem-repeat context) and Cue2 v2.0.0 (deep-learning image-based long-read caller). Per-caller VCFs were FILTER-PASS / ≥40 bp filtered, split by SV type with BCFtools, and merged by type across the 100 individuals and across the three callers with SURVIVOR v1.0.7 (1 kb distance, strand-match, min 50 bp). Centromere, reference-gap, segmental-duplication and sex-chromosome SVs were excluded. The high-confidence catalog contains 74,552 SVs (34,056 insertions, 29,545 deletions, 9,707 duplications and 1,244 inversions) released in Supplementary Table 13 (media-13.txt), with per-cohort AF / AC / AN, Hardy-Weinberg statistics and case/control carrier differentials.

The supplementary table media-13.txt was downloaded from the Kim et al. 2026 bioRxiv preprint ( doi:10.64898/2026.03.20.713192).

The step-by-step build commands (download, TSV parsing, bigBed build) are recorded in the UCSC makeDoc for this track container: doc/hg38/lrSv.txt. The conversion scripts and autoSql schemas live in makeDb/scripts/lrSv.

Data Access

The data can be explored interactively in table format with the Table Browser or the Data Integrator, and accessed programmatically through our API, track=kwanhoSv.

The bigBed is available from our download server as kwanho.bb. Example: bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/kwanho.bb -chrom=chr21 -start=0 -end=100000000 stdout.

The full supplementary data for the paper (including media-13.txt) is available from the Kim et al. 2026 preprint.

Credits

Thanks to Kim, Levin and colleagues at the Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, the Broad Institute, Yale School of Medicine, Banner Sun Health Research Institute and their collaborators for releasing this dataset.

References

Kim K, Lin Z, Simmons SK, Parker J, Kearney M, Liao Z, Haywood N, Zhang J, Cline MP, Tuncali I et al. Integrating Long-Read Structural Variant Analysis with single-nucleus RNA-seq to Elucidate Gene Expression Effects in Disease. bioRxiv. 2026 Mar 23;. PMID: 41929179; PMC: PMC13041997