This track displays 664,558 unique small open reading frames (sORFs) in the human genome from MetamORF, a meta-database that consolidates sORF data identified by both experimental and computational approaches. sORFs are defined as ORFs encoding fewer than 100 amino acids (excluding stop codons and introns).
MetamORF was built by gathering publicly available sORF data from multiple sources, normalizing it, and removing redundancy. From 2,594,154 source ORFs across human and mouse, MetamORF identified 1,162,675 unique ORFs (664,771 human, 497,904 mouse) associated with 153,553 unique transcripts. The database enables comparison of sORFs across distinct original data sources at the ORF, transcript, and gene levels. For full documentation, see the MetamORF documentation page.
The human sORFs in MetamORF were compiled from seven primary data sources and 46 individual ribosome profiling datasets from sORFs.org. The primary sources are:
| Source | Description | Reference |
|---|---|---|
| Erhard et al. 2018 | Union of ORFs detected by PRICE, RP-BP, ORF-RATER, or annotated in Ensembl v75 | Nat Methods 2018 |
| Johnstone et al. 2016 | Location and translation data for analyzed transcripts and ORFs | EMBO J 2016 |
| Laumont et al. 2016 | Cryptic MAPs (minor ORF-encoded peptides) with genomic and proteomic features | Nat Commun 2016 |
| Mackowiak et al. 2015 | Systematic identification of sORFs across vertebrate genomes | Genome Biol 2015 |
| Samandi et al. 2017 | Alternative protein predictions based on RefSeq GRCh38 | eLife 2017 |
| sORFs.org | Repository of sORFs from 46 individual ribosome profiling experiments | Olexiouk et al., Nucleic Acids Res 2018 |
ORFs were identified using three main approaches: bioinformatic predictions, ribosome profiling experiments, and mass spectrometry (proteomics, peptidomics, and proteogenomics).
MetamORF classifies ORFs by their position relative to annotated coding sequences:
ORFs are also classified by the biotype of their host RNA: intergenic, ncRNA, pseudogene, NMD (nonsense-mediated decay), or readthrough transcripts.
Items are displayed in BED 12 format showing the exon/intron structure of each sORF.
The raw data can be explored interactively with the Table Browser or the Data Integrator. The data can be accessed from scripts through our API; the track name is "metamorf".
For automated download and analysis, the genome annotation is stored in a bigBed file that can be downloaded from our download server. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed, which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, e.g.
bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/ncOrfs/metamorf/MetamORF.bb -chrom=chr21 -start=0 -end=100000000 stdoutThe original data and additional downloads are available from the MetamORF website. Source code is available on GitHub.
The MetamORF BED 12 data was obtained from the MetamORF track hub and converted to bigBed format at UCSC. Coordinates are on the GRCh38/hg38 assembly (based on Ensembl release 90).
Thanks to the MetamORF team at the TAGC (Theories and Approaches of Genomic Complexity) laboratory, Aix-Marseille University, for creating this resource and making it publicly available.
Erhard F, Halenius A, Zimmermann C, L'Hernault A, Kowalewski DJ, Weekes MP, Stevanovic S, Zimmer R, Dölken L. Improved Ribo-seq enables identification of cryptic translation events. Nat Methods. 2018 May;15(5):363-366. PMID: 29529017; PMC: PMC6152898
Johnstone TG, Bazzini AA, Giraldez AJ. Upstream ORFs are prevalent translational repressors in vertebrates. EMBO J. 2016 Apr 1;35(7):706-23. PMID: 26896445; PMC: PMC4818764
Laumont CM, Daouda T, Laverdure JP, Bonneil É, Caron-Lizotte O, Hardy MP, Granados DP, Durette C, Lemieux S, Thibault P et al. Global proteogenomic analysis of human MHC class I-associated peptides derived from non-canonical reading frames. Nat Commun. 2016 Jan 5;7:10238. PMID: 26728094; PMC: PMC4728431
Mackowiak SD, Zauber H, Bielow C, Thiel D, Kutz K, Calviello L, Mastrobuoni G, Rajewsky N, Kempa S, Selbach M et al. Extensive identification and analysis of conserved small ORFs in animals. Genome Biol. 2015 Sep 14;16:179. PMID: 26364619; PMC: PMC4568590
Olexiouk V, Van Criekinge W, Menschaert G. An update on sORFs.org: a repository of small ORFs identified by ribosome profiling. Nucleic Acids Res. 2018 Jan 4;46(D1):D497-D502. PMID: 29140531; PMC: PMC5753181
Samandi S, Roy AV, Delcourt V, Lucier JF, Gagnon J, Beaudoin MC, Vanderperre B, Breton MA, Motard J, Jacques JF et al. Deep transcriptome annotation enables the discovery and functional characterization of cryptic small proteins. Elife. 2017 Oct 30;6. PMID: 29083303; PMC: PMC5703645