Description

This track is a short-read CNV comparator to the long-read ToMMo Japanese SVs track. It shows copy number variation (CNV) frequency estimates from short-read whole-genome sequencing of 48,874 Japanese individuals from the Tohoku Medical Megabank Project (jMorp 48KJPN-CNV Frequency Panel, release 20230828).

The callset is binned at ~1 kb resolution. For each bin, the source VCF reports how many of the 48,874 samples are at each observed integer copy number (CN0 through CN5). In an autosomal region the diploid reference state is CN=2; CN<2 indicates a copy-number loss and CN>2 indicates a copy-number gain.

Display Conventions and Configuration

This track is a composite of two bigWig tracks displayed as a two-color transparent overlay, showing, per 1 kb bin, the absolute number of samples (out of 48,874) carrying:

Peaks in the overlay correspond to genomic regions where many samples show CNVs. Bins where every sample was at CN=2 (no CNV observed) are omitted from the tracks.

The default y-axis runs from 0 to ~1,000 carriers with auto-scale enabled; the maximum supported value is 48,874 (every sample). Toggle Show subtrack colors on UI to switch the subtrack visibility individually.

Methods

The ToMMo 48KJPN-CNV Frequency Panel is generated by short-read WGS of 48,874 Japanese individuals (blood buffy coat and saliva samples). Per the jMorp data provider, the analysis runs on CRAM files produced for the sibling 54KJPN-SNV/INDEL release: 200 samples per (sequencer, sequencing institution) combination are used to build a Panel of Normals with the GATK CNV Germline Cohort Workflow on 1 kb intervals of the non-N reference; the full cohort is then processed in 200-sample batches with the matching Case Workflow, per-sample amplification / loss counts are filtered by a 1.5×IQR outlier rule, and each surviving sample is tallied per 1 kb bin at each integer copy-number state (CN0..CN5). The resulting per-bin sample counts (SC) and frequencies (SF) are released as a VCF. For display here, the per-CN counts are collapsed into two per-bin values (samples with CN<2, samples with CN>2) and written as two bedGraphs / bigWigs; bins where every sample was CN=2 are omitted. 2,006,905 bins with at least one carrier are kept across the genome.

The source VCF tommo-jcnvv1-20230828-GRCh38.vcf.gz was downloaded from the jMorp 48KJPN-CNV download page.

The step-by-step build commands (download, VCF-to-bedGraph conversion, bigWig build) are recorded in the UCSC makeDoc for this track container: doc/hg38/lrSv.txt. The conversion scripts live in makeDb/scripts/lrSv.

Data Access

The data can be explored interactively in table format with the Table Browser or the Data Integrator, and accessed programmatically through our API, track=tommoJpCnv.

The bigWigs are available from our download server as tommoJpCnvLoss.bw and tommoJpCnvGain.bw. Example: bigWigAverageOverBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/tommoJpCnvLoss.bw regions.bed regions.tab or bigWigToWig http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/tommoJpCnvGain.bw -chrom=chr21 -start=0 -end=100000000 stdout.

The original VCF is available from the jMorp 48KJPN-CNV download page (tommo-jcnvv1-20230828-GRCh38.vcf.gz).

Credits

Thanks to the Tohoku Medical Megabank Organization (ToMMo) and the jMorp team for releasing the 48KJPN-CNV Frequency Panel and its detailed methodology.

References

See the jMorp 48KJPN-CNV dataset page for the official description. Earlier ToMMo CNV releases are described in Tadaka et al.; see the dataset page for the current citation list.