Description

This track shows mobile element insertions (MEIs) identified in the Human Genome Structural Variation Consortium phase 3 (HGSVC3) callset. These insertions were detected in 65 long-read assembled samples relative to the reference assembly: at each site, at least one of the 65 samples carries an inserted mobile element that is absent from the reference. Two parallel callsets were released against the two human reference assemblies:

Class GRCh38 / hg38 T2T-CHM13 / hs1
Alu10,27010,458
L11,6041,664
SVA764791
HERVK35
snRNA11
Total12,64212,919

For each MEI, the track lists the element class and family, the length of the inserted sequence, the discovery sample, the number of carrier haplotypes/samples carrying the insertion, the alt-allele frequency, the number of MEI callers (out of two) that supported the call, separate L1ME-AID and PALMER validation flags, overlap with reference segmental duplications and tandem repeats, and the full DNA sequence of the inserted mobile element (the ALT allele minus the anchor base).

Display Conventions and Configuration

An insertion has zero length on the reference: it attaches between two adjacent reference bases without replacing any of them. Following the VCF convention used by the underlying PAV calls and by the other long-read SV tracks, each MEI is drawn as a 1-bp block sitting on the anchor base — the reference base immediately to the left of the insertion attachment point. The inserted mobile element itself is not present in the reference and is therefore not drawn; its length is reported on the detail page (svLen) and in the item label (INS-svLen-carrierCount). bigBed does support truly zero-width features between nucleotides, but for consistency with the long-read SV tracks, this track uses the 1-bp anchor representation instead.

Items are colored by element class:

The score column encodes the alt-allele frequency on a 0-1000 scale. Filters allow restricting to specific element classes, length ranges, allele frequency, carrier counts, supporting callers, validation status, and reference repeat overlap.

Methods

The HGSVC3 study sequenced and de novo assembled 65 individuals (30 males, 35 females) representing five continental groups and 28 populations: 30 of African, 9 of Admixed American, 8 of European, 10 of East Asian and 8 of South Asian descent, with three parent-child trios included. Each sample was sequenced to ~47-fold coverage of PacBio HiFi and ~56-fold coverage of Oxford Nanopore long reads (~36-fold ultra-long), and complemented with Strand-seq, Bionano optical mapping, Hi-C, Iso-Seq and RNA-seq. Haplotype-resolved diploid assemblies were produced and structural variants called with PAV. Mobile element insertions were identified from the union of two independent MEI callsets, L1ME-AID and PALMER; all single-caller calls were manually curated. Orthogonal validation against an independent MELT-LRA callset showed an average concordance of 90.8% on GRCh38. Roughly 93% of MEIs are supported by both callers; the remaining single-caller calls split about 6:1 in favour of PALMER (PALMER-only ~6%, L1ME-AID-only ~1%). The Caller Count, PALMER Validated and L1ME-AID Validated filters can be used to restrict the display to the high-confidence dual-validated subset. Calls are restricted to non-low-confidence regions (i.e. excluding Yq12 and centromeres). For each site, per-sample genotypes from all 65 assembled samples are summarized into an alt-allele count, allele number, allele frequency and a list of carrier samples. See Logsdon et al. 2025 (Nature) for full methodological details.

The original CSVs were downloaded from the HGSVC3 Mobile Elements release directory (files MEI_Callset_GRCh38.ALL.20241211.csv.gz and MEI_Callset_T2T-CHM13.ALL.20241211.csv.gz) and converted to bigBed following the steps described in the makeDoc file. Conversion uses scripts in src/hg/makeDb/scripts/mei: VCF-style positions (1-based POS, anchor base) are converted to half-open BED coordinates (chromStart = POS - 1, chromEnd = chromStart + 1), genotypes are tallied across the 65 samples, and items are colored by mobile element class.

Data Access

The data can be explored interactively in table format with the Table Browser or the Data Integrator and exported from there to spreadsheet or tab-separated tables. From scripts, the data can be accessed through our API, track=meiHgsvc3.

For automated download and analysis, the genome annotation is stored in a bigBed file that can be downloaded from our download server. The file for this track is called hgsvc3.bb in /gbdb/hg38/mei/ (GRCh38) or /gbdb/hs1/mei/ (T2T-CHM13). Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed, which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain features within a given range, e.g. bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/mei/hgsvc3.bb -chrom=chr21 -start=0 -end=100000000 stdout.

The original annotation source data can be downloaded from the HGSVC3 1000 Genomes FTP site.

Credits

Thanks to the Human Genome Structural Variation Consortium phase 3 (HGSVC3) for releasing the underlying assemblies and MEI callsets used to produce this track.

References

Logsdon GA, Ebert P, Audano PA, Loftus M, Porubsky D, Ebler J, Yilmaz F, Hallast P, Prodanov T, Yoo D et al. Complex genetic variation in nearly complete human genomes. Nature. 2025 Aug;644(8076):430-441. PMID: 40702183; PMC: PMC12350169