Massively Parallel Reporter Assays (MPRAs) and related methods such as STARR-seq enable quantitative testing of thousands of candidate regulatory DNA sequences in parallel by linking each sequence to a reporter gene and measuring transcriptional output using sequencing.
The MPRA Base track shows 41,275 experimentally tested cis-regulatory elements curated from the MPRA Base database (Zhao et al., 2023), drawn from MPRA, STARR-seq, and related reporter assay experiments. The database integrates data from multiple studies, assay platforms (lentiMPRA, plasmidMPRA, STARR-seq, CRE-seq, and others), and cell types while preserving experiment-level resolution. Only elements derived from genomic fragments that can be mapped to the reference genome are included; synthetic or designed oligonucleotide libraries without genomic coordinates are excluded.
The track is a curated union of study-specific libraries rather than a uniform genome-wide enhancer catalog: each contributing study targeted a distinct set of candidate regions, including HepG2 liver-enhancer panels, melanoma GWAS variants, human/mouse pluripotent TSSs, and ASD-associated promoter variants. Each item represents one experimental measurement, not a full enhancer; longer regulatory elements may be represented by multiple adjacent tiles. Item width corresponds to the assayed DNA fragment for tile-based studies (most items, 144–200 bp; some Klein et al., 2020 elements 354–678 bp) but collapses to a single base for variant-centered studies that mark the SNP location rather than the surrounding tested window (Choi et al., 2020; Arnold et al., 2013).
Note on cell lines: The cell line shown for each element is the reporter cell line in which the genomic fragment was assayed. Most rows test human DNA in human cells; the exception is Mattioli et al., 2020, where mESC rows assay the mouse orthologous sequence in mouse cells, with hg38 coordinates derived from the human ortholog by liftOver.
The biological context of each cell line is summarized below:
| Cell line | Biological context |
|---|---|
| HepG2 | Hepatocellular carcinoma; liver enhancer studies |
| HUES64 | Human embryonic stem cells; pluripotent |
| mESC | Mouse embryonic stem cells; pluripotent |
| NPC | H1-derived neural progenitor cells; developing brain |
| HEK293FT | Embryonic kidney; high-transfection-efficiency reference |
| UACC903 | Melanoma cell line |
| HeLa | Cervical adenocarcinoma; STARR-seq proof-of-concept |
Each item represents a genomic fragment tested within a specific experiment, defined as a unique combination of cell line, assay type, and publication (PMID). The same genomic region may appear multiple times if tested in different experiments.
Items are colored by percentile rank of the mean raw activity score within each experiment:
The mouse-over shows the cell line, assay type, raw activity score, percentile rank, and citation for each element.
For most studies in this track, the raw score is the log2 ratio of reporter RNA to input DNA from the source experiment. A score of 0 means the fragment produced RNA in proportion to the input plasmid copies (no measurable activity above baseline), positive scores indicate the fragment drove the reporter above baseline (enhancer-like activity in the assay), and negative scores indicate sub-baseline output (treated as inactive, not as validated transcriptional repression). Linear fold change relative to baseline is approximately 2raw_score — for example, a raw score of 0.18 corresponds to roughly 1.13× baseline output, 1.0 to 2×, and 2.0 to 4×.
Two studies use a different scale: Mattioli et al., 2020 and Koesterich et al., 2023 report the MPRAnalyze induced-transcription rate (α), which is a positive-only quantity not directly convertible to a fold change. As noted in the Methods section, scoring methodology and the threshold used to call an element "active" differ between studies, so the percentile rank reflects within-experiment ranking only and does not by itself indicate the absolute strength of an element.
Within each experiment, replicate measurements for the same genomic fragment were aggregated by computing the mean raw activity score. The original dataset contained 211,053 replicate-level measurements; after aggregation, the final track contains 41,275 unique experiment-level genomic elements.
Elements are ranked by mean raw activity score independently within each experiment, and a percentile rank (0–100) is computed per experiment to avoid cross-study distortions caused by differing assay dynamic ranges.
Scoring methodology and the threshold used to call an element "active" differ between studies, so percentile-rank comparisons across experiments are approximate. Lower scores indicate that the fragment did not measurably activate transcription in the assay, rather than that it actively represses transcription. For any element of interest, users should consult the source publication for the original significance and effect-size calls.
Original genomic coordinates from the source studies (mostly hg19, with some mm9 and mm10) were lifted to hg38 by the MPRA Base pipeline using the UCSC liftOver tool.
The following table lists the experiments represented in this track.
| PMID | Author | Year | Lab | Cell type | Assay | Elements |
|---|---|---|---|---|---|---|
| 23328393 | Arnold et al. | 2013 | Stark Lab | HeLa | STARR-seq | 1 |
| 27831498 | Inoue et al. | 2017 | Shendure Lab | HepG2 | lentiMPRA | 2,241 |
| 30045748 | Klein et al. | 2018 | Shendure Lab | HepG2 | STARR-seq | 7,064 |
| 32483191 | Choi et al. | 2020 | Brown Lab | HEK293FT | lentiMPRA | 840 |
| 32483191 | Choi et al. | 2020 | Brown Lab | UACC903 | lentiMPRA | 840 |
| 32819422 | Mattioli et al. | 2020 | Mele Lab | HUES64 | plasmidMPRA | 6,954 |
| 32819422 | Mattioli et al. | 2020 | Mele Lab | mESC | plasmidMPRA | 6,954 |
| 33046894 | Klein et al. | 2020 | Shendure Lab | HepG2 | lentiMPRA | 8,116 |
| 33046894 | Klein et al. | 2020 | Shendure Lab | HepG2 | plasmidMPRA | 2,228 |
| 33046894 | Klein et al. | 2020 | Shendure Lab | HepG2 | STARR-seq | 2,230 |
| 36834916 | Koesterich et al. | 2023 | Kreimer Lab | NPC | lentiMPRA | 3,807 |
The data can be explored interactively in table format with the Table Browser or the Data Integrator and exported from there to spreadsheet or tab-sep tables. From scripts, the data can be accessed through our API, track=mprabase.
For automated download and analysis, the genome annotation is stored in a bigBed file that can be downloaded from our download server. The file for this track is called mprabase.bb. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed, which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain features within a given range, e.g. bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/mpra/mprabase/mprabase.bb -chrom=chr21 -start=0 -end=100000000 stdout
The original data can be downloaded from the MPRA Base web application.
Thanks to Varda Singhal, Jianyu Zhao, and the Ahituv Lab at the University of California San Francisco for creating and curating MPRA Base and for creating this track.
Arnold CD, Gerlach D, Stelzer C, Boryń ŁM, Rath M, Stark A. Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science. 2013 Mar 1;339(6123):1074-7. PMID: 23328393
Choi J, Zhang T, Vu A, Ablain J, Makowski MM, Colli LM, Xu M, Hennessey RC, Yin J, Rothschild H et al. Massively parallel reporter assays of melanoma risk variants identify MX2 as a gene promoting melanoma. Nat Commun. 2020 Jun 1;11(1):2718. PMID: 32483191; PMC: PMC7264232
Inoue F, Kircher M, Martin B, Cooper GM, Witten DM, McManus MT, Ahituv N, Shendure J. A systematic comparison reveals substantial differences in chromosomal versus episomal encoding of enhancer activity. Genome Res. 2017 Jan;27(1):38-52. PMID: 27831498; PMC: PMC5204343
Klein JC, Keith A, Agarwal V, Durham T, Shendure J. Functional characterization of enhancer evolution in the primate lineage. Genome Biol. 2018 Jul 25;19(1):99. PMID: 30045748; PMC: PMC6060477
Klein JC, Agarwal V, Inoue F, Keith A, Martin B, Kircher M, Ahituv N, Shendure J. A systematic evaluation of the design and context dependencies of massively parallel reporter assays. Nat Methods. 2020 Nov;17(11):1083-1091. PMID: 33046894; PMC: PMC7727316
Koesterich J, An JY, Inoue F, Sohota A, Ahituv N, Sanders SJ, Kreimer A. Characterization of De Novo Promoter Variants in Autism Spectrum Disorder with Massively Parallel Reporter Assays. Int J Mol Sci. 2023 Feb 9;24(4). PMID: 36834916; PMC: PMC9959321
Mattioli K, Oliveros W, Gerhardinger C, Andergassen D, Maass PG, Rinn JL, Melé M. Cis and trans effects differentially contribute to the evolution of promoters and enhancers. Genome Biol. 2020 Aug 20;21(1):210. PMID: 32819422; PMC: PMC7439725
Zhao J, Baltoumas FA, Konnaris MA, Mouratidis I, Liu Z, Sims J, Agarwal V, Pavlopoulos GA, Georgakopoulos-Soares I, Ahituv N. MPRAbase: A Massively Parallel Reporter Assay Database. bioRxiv. 2023 Nov 22;. PMID: 38045264; PMC: PMC10690217