Description

The MPRAVarDB track shows 239,028 variants successfully mapped to hg38 (from 242,818 total) across 18 MPRA studies compiled in the MPRAVarDB database (Jin et al., 2024). Each variant was experimentally tested in an MPRA experiment to evaluate whether it affects regulatory activity. The database covers over 30 cell lines and 30 human diseases and traits, including neurodegenerative diseases, immune disorders, melanoma, multiple myeloma, and autoimmune diseases.

Note on cell lines: The cell line shown for each variant is the reporter cell line in which the human regulatory element was assayed. Several studies used mouse cell lines (e.g. Neuro-2a, N2A, NIH/3T3, MIN6) as reporter systems for human sequences; these variants retain human (hg38) coordinates.

Note on study type: Not all studies measure transcriptional regulation in the same sense. Two of the larger contributors, Griesemer et al., 2021 (72,546 variants) and Schuster et al., 2023 (26,546 variants), test 3'UTR variants placed downstream of the reporter, where the log2 fold change between alleles reflects changes in mRNA stability, decay, RBP or miRNA binding, or translation efficiency rather than transcriptional activation. The remaining studies test 5' regulatory elements (promoters and enhancers) where log2FC reflects changes in transcription. Together, the 3'UTR studies account for 99,092 of the 239,028 variants in the track (~41%).

Display Conventions

Items are colored by statistical significance:

Each item shows the variant name (rsID when available, otherwise chr:pos:ref>alt), the reference and alternate alleles, the associated disease or trait, cell line, log2 fold change, p-value, and FDR.

Cell-type specificity: MPRA results are typically cell-type-specific, and significance in one cell line does not imply activity in another. For example, Tewhey et al., 2016 found only modest correlation (R ≈ 0.63) between LCL and HepG2 measurements of the same eQTL variants, and McAfee et al., 2023 reported that only 205 of 1,004 HEK293-positive variants overlapped HNP-positive variants. The cell line filter can be used to narrow results to a relevant context.

Note on Kircher et al., 2019: This study contributes 44,647 variants (~19% of the track) using a saturation mutagenesis design that tests nearly every possible nucleotide substitution at each position of 20 disease-associated regulatory elements at single-base-pair resolution: 10 promoters (TERT, LDLR, HBB, HBG1, HNF4A, MSMB, PKLR, F9, FOXE1, GP1BB) and 10 enhancers (SORT1, ZRS, BCL11A, IRF4, IRF6, MYC tested with two distinct enhancers, RET, TCF7L2, and the UC88 ultraconserved enhancer). Regions over those elements show many densely-packed Kircher variants that may dominate visualization at those loci.

Interpreting log2FC

The log2 fold change is computed as log2(alt RNA/DNA) − log2(ref RNA/DNA). A positive value means the alternate allele drove more reporter activity than the reference allele in this assay; a negative value means the reverse. The linear allelic ratio is approximately 2log2FC: log2FC = 0.5 corresponds to roughly 1.41× allelic difference, log2FC = 1.0 to 2×, and log2FC = 2.0 to 4×. As noted in the Description section, log2FC reflects transcriptional activation for 5'-regulatory studies and steady-state mRNA abundance, decay, or translation efficiency for 3'UTR studies (Griesemer et al., 2021; Schuster et al., 2023).

Studies

The following table lists the 18 MPRA studies included in MPRAVarDB, with the number of tested variants, diseases/traits, cell lines, and a brief description of the variant selection.

Study Variants Disease/Trait Cell Line(s) Description
Griesemer et al., 2021 72,546 NHGRI-EBI GWAS catalog GM12878, HEK293FT, HMEC, HepG2, K562, SKNSH 3'UTR SNPs and indels in LD with GWAS catalog variants, variants under positive selection, and rare outlier expression variants from GTEx
Kircher et al., 2019 44,647 Various (18 diseases including diabetes, cancer, blood disorders, limb malformations) HEK293T, HEL92.1.7, HaCaT, HeLa, HepG2, K562, LNCaP, MIN6, NIH/3T3, Neuro-2a, SK-MEL-28, SF7996 Saturation mutagenesis of 20 disease-associated regulatory elements at single base-pair resolution
Abell et al., 2022 29,564 eQTL (no specific disease) GM12878 30,893 variants in LD with independent, common, top-ranked eQTL across 744 eGenes in the CEU cohort
Tewhey et al., 2016 23,430 eQTL (no specific disease) GM12878 32,373 variants associated with eQTLs in lymphoblastoid cell lines
Schuster et al., 2023 26,546 Prostate cancer PC3 14,497 single-nucleotide mutations enriched in oncogenic pathways and 3'UTR regulatory elements
Mouri et al., 2022 14,549 Autoimmune diseases (Crohn's, IBD, psoriasis, MS, RA, T1D, ulcerative colitis) Jurkat GWAS variants from autoimmune disease loci tested for regulatory element activity in T cells
McAfee et al., 2023 10,302 Schizophrenia HEK293s, HNPS 5,173 fine-mapped schizophrenia GWAS variants
Cooper et al., 2022 5,330 Alzheimer's disease, Progressive supranuclear palsy HEK293T 5,706 noncoding SNVs from 25 AD and 9 PSP genome-wide significant loci
Long et al., 2022 3,980 Melanoma C283T, UACC903 1,992 risk-associated variants in tight LD (r2>0.8) from 54 melanoma risk loci
Myint et al., 2020 2,158 Schizophrenia, Alzheimer's disease K562, SH-SY5Y 1,049 SZ and 30 AD variants in 64 SZ loci and 9 AD loci
Choi et al., 2020 1,664 Melanoma HEK293FT, UACC903 GWAS melanoma risk variants
Ajore et al., 2022 1,582 Multiple myeloma L363, MOLP8 1,039 variants in high LD (r2>0.8) at 23 MM risk loci
Klein et al., 2019 1,119 Osteoarthritis Saos-2 1,605 SNPs in high LD (r2>0.8) at 35 lead SNPs associated with OA via GWAS
Lu et al., 2021 1,036 Systemic lupus erythematosus GM12878, Jurkat 18,312 variants in tight LD (r2>0.8) with 578 GWAS index variants at 531 loci
Mulvey & Dougherty, 2021 275 Major depressive disorder N2A Over 1,000 SNPs from 39 neuropsychiatric GWAS loci, selected by overlap with eQTL and histone marks
Ferraro et al., 2020 150 Rare variant expression (no specific disease) GM12878 Rare variants contributing to extreme expression, allelic expression, and splicing across 49 GTEx tissues
Rao et al., 2021 88 Alcohol use disorder BLA, CE, NAC, SFC SNPs in 3'UTR of 88 genes from allele-specific expression analysis (30 AUD subjects vs 30 controls)
Ulirsch et al., 2016 62 Red blood cell traits K562, K562+GATA1 2,756 variants in strong LD with 75 sentinel variants associated with RBC traits

Variant counts above are from the source publications (pre-liftOver totals). Of 242,818 total source variants, 239,028 lifted successfully to hg38; see Methods.

Methods

Data was downloaded from the MPRAVarDB web server. Variants originally mapped to hg19 (213,689 of 242,818) were lifted to hg38 using liftOver. 114 variants could not be mapped and were excluded. The remaining variants were merged with the 29,129 natively hg38-mapped variants to produce a total of 239,028 hg38 records.

Significance thresholds across studies: The source studies in MPRAVarDB do not all use the same significance framework. Most studies apply a Benjamini-Hochberg FDR threshold (commonly 0.05 or 0.10), but some report only nominal regression p-values. For example, Tewhey et al., 2016 uses BH FDR < 0.05 to call "emVars", Griesemer et al., 2021 and McAfee et al., 2023 use BH FDR < 0.10, and Kircher et al., 2019 reports raw regression p-values rather than FDR. The track applies a uniform FDR < 0.05 / nominal p < 0.05 color cutoff for visual consistency, which is the more conservative of the FDR thresholds reported by the source studies. For any variant of interest, consult the source publication for the original significance call.

Data Access

The data can be explored interactively in table format with the Table Browser or the Data Integrator and exported from there to spreadsheet or tab-sep tables. From scripts, the data can be accessed through our API, track=mpraVarDb.

For automated download and analysis, the genome annotation is stored in a bigBed file that can be downloaded from our download server. The file for this track is called mpravardb.bb. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed, which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain features within a given range, e.g. bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/mpra/mpravardb/mpravardb.bb -chrom=chr21 -start=0 -end=100000000 stdout

The original annotation source data can be downloaded from the MPRAVarDB web server.

Credits

Thanks to Tao Wang and colleagues at the University of Florida for creating and maintaining the MPRAVarDB database.

References

Abell NS, DeGorter MK, Gloudemans MJ, Greenwald E, Smith KS, He Z, Montgomery SB. Multiple causal variants underlie genetic associations in humans. Science. 2022 Mar 18;375(6586):1247-1254. PMID: 35298243; PMC: PMC9725108

Ajore R, Niroula A, Pertesi M, Cafaro C, Thodberg M, Went M, Bao EL, Duran-Lozano L, Lopez de Lapuente Portilla A, Olafsdottir T et al. Functional dissection of inherited non-coding variation influencing multiple myeloma risk. Nat Commun. 2022 Jan 10;13(1):151. PMID: 35013207; PMC: PMC8748989

Choi J, Zhang T, Vu A, Ablain J, Makowski MM, Colli LM, Xu M, Hennessey RC, Yin J, Rothschild H et al. Massively parallel reporter assays of melanoma risk variants identify MX2 as a gene promoting melanoma. Nat Commun. 2020 Jun 1;11(1):2718. PMID: 32483191; PMC: PMC7264232

Cooper YA, Teyssier N, Dräger NM, Guo Q, Davis JE, Sattler SM, Yang Z, Patel A, Wu S, Kosuri S et al. Functional regulatory variants implicate distinct transcriptional networks in dementia. Science. 2022 Aug 19;377(6608):eabi8654. PMID: 35981026

Ferraro NM, Strober BJ, Einson J, Abell NS, Aguet F, Barbeira AN, Brandt M, Bucan M, Castel SE, Davis JR et al. Transcriptomic signatures across human tissues identify functional rare genetic variation. Science. 2020 Sep 11;369(6509). PMID: 32913073; PMC: PMC7646251

Griesemer D, Xue JR, Reilly SK, Ulirsch JC, Kukreja K, Davis JR, Kanai M, Yang DK, Butts JC, Guney MH et al. Genome-wide functional screen of 3'UTR variants uncovers causal variants for human disease and evolution. Cell. 2021 Sep 30;184(20):5247-5260.e19. PMID: 34534445; PMC: PMC8487971

Jin W, Xia Y, Nizomov J, Liu Y, Li Z, Lu Q, Chen L. MPRAVarDB: an online database and web server for exploring regulatory effects of genetic variants. Bioinformatics. 2024 Oct 1;40(10). PMID: 39325859; PMC: PMC11464417

Kircher M, Xiong C, Martin B, Schubach M, Inoue F, Bell RJA, Costello JF, Shendure J, Ahituv N. Saturation mutagenesis of twenty disease-associated regulatory elements at single base-pair resolution. Nat Commun. 2019 Aug 8;10(1):3583. PMID: 31395865; PMC: PMC6687891

Klein JC, Keith A, Rice SJ, Shepherd C, Agarwal V, Loughlin J, Shendure J. Functional testing of thousands of osteoarthritis-associated variants for regulatory activity. Nat Commun. 2019 Jun 4;10(1):2434. PMID: 31164647; PMC: PMC6547687

Long E, Yin J, Funderburk KM, Xu M, Feng J, Kane A, Zhang T, Myers T, Golden A, Thakur R et al. Massively parallel reporter assays and variant scoring identified functional variants and target genes for melanoma loci and highlighted cell-type specificity. Am J Hum Genet. 2022 Dec 1;109(12):2210-2229. PMID: 36423637; PMC: PMC9748337

Lu X, Chen X, Forney C, Donmez O, Miller D, Parameswaran S, Hong T, Huang Y, Pujato M, Cazares T et al. Global discovery of lupus genetic risk variant allelic enhancer activity. Nat Commun. 2021 Mar 12;12(1):1611. PMID: 33712590; PMC: PMC7955039

McAfee JC, Lee S, Lee J, Bell JL, Krupa O, Davis J, Insigne K, Bond ML, Zhao N, Boyle AP et al. Systematic investigation of allelic regulatory activity of schizophrenia-associated common variants. Cell Genom. 2023 Oct 11;3(10):100404. PMID: 37868037; PMC: PMC10589626

Mouri K, Guo MH, de Boer CG, Lissner MM, Harten IA, Newby GA, DeBerg HA, Platt WF, Gentili M, Liu DR et al. Prioritization of autoimmune disease-associated genetic variants that perturb regulatory element activity in T cells. Nat Genet. 2022 May;54(5):603-612. PMID: 35513721; PMC: PMC9793778

Mulvey B, Dougherty JD. Transcriptional-regulatory convergence across functional MDD risk variants identified by massively parallel reporter assays. Transl Psychiatry. 2021 Jul 22;11(1):403. PMID: 34294677; PMC: PMC8298436

Myint L, Wang R, Boukas L, Hansen KD, Goff LA, Avramopoulos D. A screen of 1,049 schizophrenia and 30 Alzheimer's-associated variants for regulatory potential. Am J Med Genet B Neuropsychiatr Genet. 2020 Jan;183(1):61-73. PMID: 31503409; PMC: PMC7233147

Rao X, Thapa KS, Chen AB, Lin H, Gao H, Reiter JL, Hargreaves KA, Ipe J, Lai D, Xuei X et al. Allele-specific expression and high-throughput reporter assay reveal functional genetic variants associated with alcohol use disorders. Mol Psychiatry. 2021 Apr;26(4):1142-1151. PMID: 31477794; PMC: PMC7050407

Schuster SL, Arora S, Wladyka CL, Itagi P, Corey L, Young D, Stackhouse BL, Kollath L, Wu QV, Corey E et al. Multi-level functional genomics reveals molecular and cellular oncogenicity of patient-based 3'-untranslated region mutations. Cell Rep. 2023 Aug 29;42(8):112840. PMID: 37516102; PMC: PMC10540565

Tewhey R, Kotliar D, Park DS, Liu B, Winnicki S, Reilly SK, Andersen KG, Mikkelsen TS, Lander ES, Schaffner SF et al. Direct Identification of Hundreds of Expression-Modulating Variants using a Multiplexed Reporter Assay. Cell. 2016 Jun 2;165(6):1519-1529. PMID: 27259153; PMC: PMC4957403

Ulirsch JC, Nandakumar SK, Wang L, Giani FC, Zhang X, Rogov P, Melnikov A, McDonel P, Do R, Mikkelsen TS et al. Systematic Functional Dissection of Common Genetic Variation Affecting Red Blood Cell Traits. Cell. 2016 Jun 2;165(6):1530-1545. PMID: 27259154; PMC: PMC4893171