cartVersion cartVersion cartVersion cartVersion 0 0 0 0 0 0 0 0 0 0 0 cartVersion cartVersion cartVersion 0 cartVersion 0 cpgIslandExt CpG Islands bed 4 + CpG Islands (Islands < 300 Bases are Light Green) 3 1 0 100 0 128 228 128 0 0 0

Description

\ \

CpG islands are associated with genes, particularly housekeeping\ genes, in vertebrates. CpG islands are typically common near\ transcription start sites and may be associated with promoter\ regions. Normally a C (cytosine) base followed immediately by a \ G (guanine) base (a CpG) is rare in\ vertebrate DNA because the Cs in such an arrangement tend to be\ methylated. This methylation helps distinguish the newly synthesized\ DNA strand from the parent strand, which aids in the final stages of\ DNA proofreading after duplication. However, over evolutionary time,\ methylated Cs tend to turn into Ts because of spontaneous\ deamination. The result is that CpGs are relatively rare unless\ there is selective pressure to keep them or a region is not methylated\ for some other reason, perhaps having to do with the regulation of gene\ expression. CpG islands are regions where CpGs are present at\ significantly higher levels than is typical for the genome as a whole.

\ \

\ The unmasked version of the track displays potential CpG islands\ that exist in repeat regions and would otherwise not be visible\ in the repeat masked version.\

\ \

\ By default, only the masked version of the track is displayed. To view the\ unmasked version, change the visibility settings in the track controls at\ the top of this page.\

\ \

Methods

\ \

CpG islands were predicted by searching the sequence one base at a\ time, scoring each dinucleotide (+17 for CG and -1 for others) and\ identifying maximally scoring segments. Each segment was then\ evaluated for the following criteria:\ \

\

\

\ The entire genome sequence, masking areas included, was\ used for the construction of the track Unmasked CpG.\ The track CpG Islands is constructed on the sequence after\ all masked sequence is removed.\

\ \

The CpG count is the number of CG dinucleotides in the island. \ The Percentage CpG is the ratio of CpG nucleotide bases\ (twice the CpG count) to the length. The ratio of observed to expected \ CpG is calculated according to the formula (cited in \ Gardiner-Garden et al. (1987)):\ \

    Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G)
\ \ where N = length of sequence.

\

\ The calculation of the track data is performed by the following command sequence:\

\
twoBitToFa assembly.2bit stdout | maskOutFa stdin hard stdout \\\
  | cpg_lh /dev/stdin 2> cpg_lh.err \\\
    |  awk '{$2 = $2 - 1; width = $3 - $2;  printf("%s\\t%d\\t%s\\t%s %s\\t%s\\t%s\\t%0.0f\\t%0.1f\\t%s\\t%s\\n", $1, $2, $3, $5, $6, width, $6, width*$7*0.01, 100.0*2*$6/width, $7, $9);}' \\\
     | sort -k1,1 -k2,2n > cpgIsland.bed\
\ The unmasked track data is constructed from\ twoBitToFa -noMask output for the twoBitToFa command.\

\ \

Data access

\

\ CpG islands and its associated tables can be explored interactively using the\ REST API, the\ Table Browser or the\ Data Integrator.\ All the tables can also be queried directly from our public MySQL\ servers, with more information available on our\ help page as well as on\ our blog.

\

\ The source for the cpg_lh program can be obtained from\ src/utils/cpgIslandExt/.\ The cpg_lh program binary can be obtained from: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/cpg_lh (choose "save file")\

\ \

Credits

\ \

This track was generated using a modification of a program developed by G. Miklem and L. Hillier \ (unpublished).

\ \

References

\ \

\ Gardiner-Garden M, Frommer M.\ \ CpG islands in vertebrate genomes.\ J Mol Biol. 1987 Jul 20;196(2):261-82.\ PMID: 3656447\

\ regulation 1 html cpgIslandSuper\ longLabel CpG Islands (Islands < 300 Bases are Light Green)\ parent cpgIslandSuper pack\ priority 1\ shortLabel CpG Islands\ track cpgIslandExt\ rmsk RepeatMasker rmsk Repeating Elements by RepeatMasker 1 1 0 0 0 127 127 127 1 0 0

Description

\ \

\ This track was created by using Arian Smit's\ RepeatMasker\ program, which screens DNA sequences\ for interspersed repeats and low complexity DNA sequences. The program\ outputs a detailed annotation of the repeats that are present in the\ query sequence (represented by this track), as well as a modified version\ of the query sequence in which all the annotated repeats have been masked\ (generally available on the\ Downloads page). RepeatMasker uses the\ Repbase Update library of repeats from the\ Genetic \ Information Research Institute (GIRI).\ Repbase Update is described in Jurka (2000) in the References section below.\ Some newer assemblies have been made with Dfam, not Repbase. You can\ find the details for how we make our database data here in our "makeDb/doc/"\ directory.

\ \

Display Conventions and Configuration

\ \

\ In full display mode, this track displays up to ten different classes of repeats:\

\

\ \

\ The level of color shading in the graphical display reflects the amount of\ base mismatch, base deletion, and base insertion associated with a repeat\ element. The higher the combined number of these, the lighter the shading.\

\ \

\ A "?" at the end of the "Family" or "Class" (for example, DNA?) signifies that\ the curator was unsure of the classification. At some point in the future,\ either the "?" will be removed or the classification will be changed.

\ \

Methods

\ \

\ Data are generated using the RepeatMasker -s flag. Additional flags\ may be used for certain organisms. Repeats are soft-masked. Alignments may\ extend through repeats, but are not permitted to initiate in them.\ See the FAQ for more information.\

\ \

Credits

\ \

\ Thanks to Arian Smit, Robert Hubley and GIRI for providing the tools and\ repeat libraries used to generate this track.\

\ \

References

\ \

\ Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0.\ \ http://www.repeatmasker.org. 1996-2010.\

\ \

\ Repbase Update is described in:\

\ \

\ Jurka J.\ \ Repbase Update: a database and an electronic journal of repetitive elements.\ Trends Genet. 2000 Sep;16(9):418-420.\ PMID: 10973072\

\ \

\ For a discussion of repeats in mammalian genomes, see:\

\ \

\ Smit AF.\ \ Interspersed repeats and other mementos of transposable elements in mammalian genomes.\ Curr Opin Genet Dev. 1999 Dec;9(6):657-63.\ PMID: 10607616\

\ \

\ Smit AF.\ \ The origin of interspersed repeats in the human genome.\ Curr Opin Genet Dev. 1996 Dec;6(6):743-8.\ PMID: 8994846\

\ varRep 0 canPack off\ group varRep\ longLabel Repeating Elements by RepeatMasker\ maxWindowToDraw 10000000\ priority 1\ shortLabel RepeatMasker\ spectrum on\ track rmsk\ type rmsk\ visibility dense\ unipAliSwissprot SwissProt Aln. bigPsl UCSC alignment of SwissProt proteins to genome (dark blue: main isoform, light blue: alternative isoforms) 3 1 0 0 0 127 127 127 0 0 0 genes 1 baseColorDefault genomicCodons\ baseColorTickColor contrastingColor\ baseColorUseCds given\ bigDataUrl /gbdb/aptMan1/uniprot/unipAliSwissprot.bb\ indelDoubleInsert on\ indelQueryInsert on\ itemRgb on\ labelFields name,acc,uniprotName,geneName,hgncSym,refSeq,refSeqProt,ensProt\ longLabel UCSC alignment of SwissProt proteins to genome (dark blue: main isoform, light blue: alternative isoforms)\ mouseOverField protFullNames\ parent uniprot\ priority 1\ searchIndex name,acc\ shortLabel SwissProt Aln.\ showDiffBasesAllScales on\ skipFields isMain\ track unipAliSwissprot\ type bigPsl\ urls acc="https://www.uniprot.org/uniprot/$$" hgncId="https://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=$$" refSeq="https://www.ncbi.nlm.nih.gov/nuccore/$$" refSeqProt="https://www.ncbi.nlm.nih.gov/protein/$$" ncbiGene="https://www.ncbi.nlm.nih.gov/gene/$$" entrezGene="https://www.ncbi.nlm.nih.gov/gene/$$" ensGene="https://www.ensembl.org/Gene/Summary?g=$$"\ visibility pack\ refGene RefSeq Genes genePred refPep refMrna RefSeq Genes 1 2 12 12 120 133 133 187 0 0 0

Description

\ \

\ The RefSeq Genes track shows known brown kiwi protein-coding and\ non-protein-coding genes taken from the NCBI RNA reference sequences\ collection (RefSeq). The data underlying this track are updated weekly.

\ \

\ Please visit the Feedback for Gene and Reference Sequences (RefSeq) page to\ make suggestions, submit additions and corrections, or ask for help concerning\ RefSeq records.\

\ \

\ For more information on the different gene tracks, see our Genes FAQ.

\ \

Display Conventions and Configuration

\ \

\ This track follows the display conventions for\ \ gene prediction tracks.\ The color shading indicates the level of review the RefSeq record has\ undergone: predicted (light), provisional (medium), reviewed (dark).\

\ \

\ The item labels and display colors of features within this track can be\ configured through the controls at the top of the track description page.\

\

\ \

Methods

\ \

\ RefSeq RNAs were aligned against the brown kiwi genome using BLAT. Those\ with an alignment of less than 15% were discarded. When a single RNA\ aligned in multiple places, the alignment having the highest base identity\ was identified. Only alignments having a base identity level within 0.1% of\ the best and at least 96% base identity with the genomic sequence were kept.\

\ \

Credits

\ \

\ This track was produced at UCSC from RNA sequence data generated by scientists\ worldwide and curated by the NCBI\ RefSeq project.\

\ \

References

\ \

\ Kent WJ.\ \ BLAT - the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.\ PMID: 11932250; PMC: PMC187518\

\ \

\ Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J,\ Landrum MJ, McGarvey KM et al.\ \ RefSeq: an update on mammalian reference sequences.\ Nucleic Acids Res. 2014 Jan;42(Database issue):D756-63.\ PMID: 24259432; PMC: PMC3965018\

\ \

\ Pruitt KD, Tatusova T, Maglott DR.\ \ NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.\ Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4.\ PMID: 15608248; PMC: PMC539979\

\ genes 1 baseColorUseCds given\ color 12,12,120\ group genes\ idXref hgFixed.refLink mrnaAcc name\ longLabel RefSeq Genes\ priority 2\ shortLabel RefSeq Genes\ track refGene\ type genePred refPep refMrna\ visibility dense\ unipAliTrembl TrEMBL Aln. bigPsl UCSC alignment of TrEMBL proteins to genome 0 2 0 0 0 127 127 127 0 0 0 genes 1 baseColorDefault genomicCodons\ baseColorTickColor contrastingColor\ baseColorUseCds given\ bigDataUrl /gbdb/aptMan1/uniprot/unipAliTrembl.bb\ indelDoubleInsert on\ indelQueryInsert on\ itemRgb on\ labelFields name,acc,uniprotName,geneName,hgncSym,refSeq,refSeqProt,ensProt\ longLabel UCSC alignment of TrEMBL proteins to genome\ mouseOverField protFullNames\ parent uniprot off\ priority 2\ searchIndex name,acc\ shortLabel TrEMBL Aln.\ showDiffBasesAllScales on\ skipFields isMain\ track unipAliTrembl\ type bigPsl\ urls acc="https://www.uniprot.org/uniprot/$$" hgncId="https://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=$$" refseq="https://www.ncbi.nlm.nih.gov/nuccore/$$" refSeqProt="https://www.ncbi.nlm.nih.gov/protein/$$" ncbiGene="https://www.ncbi.nlm.nih.gov/gene/$$" entrezGene="https://www.ncbi.nlm.nih.gov/gene/$$" ensGene="https://www.ensembl.org/Gene/Summary?g=$$"\ visibility hide\ cpgIslandExtUnmasked Unmasked CpG bed 4 + CpG Islands on All Sequence (Islands < 300 Bases are Light Green) 0 2 0 100 0 128 228 128 0 0 0

Description

\ \

CpG islands are associated with genes, particularly housekeeping\ genes, in vertebrates. CpG islands are typically common near\ transcription start sites and may be associated with promoter\ regions. Normally a C (cytosine) base followed immediately by a \ G (guanine) base (a CpG) is rare in\ vertebrate DNA because the Cs in such an arrangement tend to be\ methylated. This methylation helps distinguish the newly synthesized\ DNA strand from the parent strand, which aids in the final stages of\ DNA proofreading after duplication. However, over evolutionary time,\ methylated Cs tend to turn into Ts because of spontaneous\ deamination. The result is that CpGs are relatively rare unless\ there is selective pressure to keep them or a region is not methylated\ for some other reason, perhaps having to do with the regulation of gene\ expression. CpG islands are regions where CpGs are present at\ significantly higher levels than is typical for the genome as a whole.

\ \

\ The unmasked version of the track displays potential CpG islands\ that exist in repeat regions and would otherwise not be visible\ in the repeat masked version.\

\ \

\ By default, only the masked version of the track is displayed. To view the\ unmasked version, change the visibility settings in the track controls at\ the top of this page.\

\ \

Methods

\ \

CpG islands were predicted by searching the sequence one base at a\ time, scoring each dinucleotide (+17 for CG and -1 for others) and\ identifying maximally scoring segments. Each segment was then\ evaluated for the following criteria:\ \

\

\

\ The entire genome sequence, masking areas included, was\ used for the construction of the track Unmasked CpG.\ The track CpG Islands is constructed on the sequence after\ all masked sequence is removed.\

\ \

The CpG count is the number of CG dinucleotides in the island. \ The Percentage CpG is the ratio of CpG nucleotide bases\ (twice the CpG count) to the length. The ratio of observed to expected \ CpG is calculated according to the formula (cited in \ Gardiner-Garden et al. (1987)):\ \

    Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G)
\ \ where N = length of sequence.

\

\ The calculation of the track data is performed by the following command sequence:\

\
twoBitToFa assembly.2bit stdout | maskOutFa stdin hard stdout \\\
  | cpg_lh /dev/stdin 2> cpg_lh.err \\\
    |  awk '{$2 = $2 - 1; width = $3 - $2;  printf("%s\\t%d\\t%s\\t%s %s\\t%s\\t%s\\t%0.0f\\t%0.1f\\t%s\\t%s\\n", $1, $2, $3, $5, $6, width, $6, width*$7*0.01, 100.0*2*$6/width, $7, $9);}' \\\
     | sort -k1,1 -k2,2n > cpgIsland.bed\
\ The unmasked track data is constructed from\ twoBitToFa -noMask output for the twoBitToFa command.\

\ \

Data access

\

\ CpG islands and its associated tables can be explored interactively using the\ REST API, the\ Table Browser or the\ Data Integrator.\ All the tables can also be queried directly from our public MySQL\ servers, with more information available on our\ help page as well as on\ our blog.

\

\ The source for the cpg_lh program can be obtained from\ src/utils/cpgIslandExt/.\ The cpg_lh program binary can be obtained from: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/cpg_lh (choose "save file")\

\ \

Credits

\ \

This track was generated using a modification of a program developed by G. Miklem and L. Hillier \ (unpublished).

\ \

References

\ \

\ Gardiner-Garden M, Frommer M.\ \ CpG islands in vertebrate genomes.\ J Mol Biol. 1987 Jul 20;196(2):261-82.\ PMID: 3656447\

\ regulation 1 html cpgIslandSuper\ longLabel CpG Islands on All Sequence (Islands < 300 Bases are Light Green)\ parent cpgIslandSuper hide\ priority 2\ shortLabel Unmasked CpG\ track cpgIslandExtUnmasked\ unipLocSignal Signal Peptide bigBed 12 + UniProt Signal Peptides 1 3 255 0 150 255 127 202 0 0 0 genes 1 bigDataUrl /gbdb/aptMan1/uniprot/unipLocSignal.bb\ color 255,0,150\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ itemRgb off\ longLabel UniProt Signal Peptides\ parent uniprot\ priority 3\ shortLabel Signal Peptide\ track unipLocSignal\ type bigBed 12 +\ visibility dense\ unipLocExtra Extracellular bigBed 12 + UniProt Extracellular Domain 1 4 0 150 255 127 202 255 0 0 0 genes 1 bigDataUrl /gbdb/aptMan1/uniprot/unipLocExtra.bb\ color 0,150,255\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ itemRgb off\ longLabel UniProt Extracellular Domain\ parent uniprot\ priority 4\ shortLabel Extracellular\ track unipLocExtra\ type bigBed 12 +\ visibility dense\ unipInterest Interest bigBed 12 + UniProt Regions of Interest 1 4 0 0 0 127 127 127 0 0 0 genes 1 bigDataUrl /gbdb/aptMan1/uniprot/unipInterest.bb\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ itemRgb off\ longLabel UniProt Regions of Interest\ parent uniprot\ priority 4\ shortLabel Interest\ track unipInterest\ type bigBed 12 +\ visibility dense\ unipLocTransMemb Transmembrane bigBed 12 + UniProt Transmembrane Domains 1 5 0 150 0 127 202 127 0 0 0 genes 1 bigDataUrl /gbdb/aptMan1/uniprot/unipLocTransMemb.bb\ color 0,150,0\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ itemRgb off\ longLabel UniProt Transmembrane Domains\ parent uniprot\ priority 5\ shortLabel Transmembrane\ track unipLocTransMemb\ type bigBed 12 +\ visibility dense\ unipLocCytopl Cytoplasmic bigBed 12 + UniProt Cytoplasmic Domains 1 6 255 150 0 255 202 127 0 0 0 genes 1 bigDataUrl /gbdb/aptMan1/uniprot/unipLocCytopl.bb\ color 255,150,0\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ itemRgb off\ longLabel UniProt Cytoplasmic Domains\ parent uniprot\ priority 6\ shortLabel Cytoplasmic\ track unipLocCytopl\ type bigBed 12 +\ visibility dense\ unipChain Chains bigBed 12 + UniProt Mature Protein Products (Polypeptide Chains) 1 7 0 0 0 127 127 127 0 0 0 genes 1 bigDataUrl /gbdb/aptMan1/uniprot/unipChain.bb\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ longLabel UniProt Mature Protein Products (Polypeptide Chains)\ parent uniprot\ priority 7\ shortLabel Chains\ track unipChain\ type bigBed 12 +\ urls uniProtId="http://www.uniprot.org/uniprot/$$#ptm_processing" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$"\ visibility dense\ unipDisulfBond Disulf. Bonds bigBed 12 + UniProt Disulfide Bonds 1 8 0 0 0 127 127 127 0 0 0 genes 1 bigDataUrl /gbdb/aptMan1/uniprot/unipDisulfBond.bb\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ longLabel UniProt Disulfide Bonds\ parent uniprot\ priority 8\ shortLabel Disulf. Bonds\ track unipDisulfBond\ type bigBed 12 +\ visibility dense\ unipDomain Domains bigBed 12 + UniProt Domains 1 8 0 0 0 127 127 127 0 0 0 genes 1 bigDataUrl /gbdb/aptMan1/uniprot/unipDomain.bb\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ longLabel UniProt Domains\ parent uniprot\ priority 8\ shortLabel Domains\ track unipDomain\ type bigBed 12 +\ urls uniProtId="http://www.uniprot.org/uniprot/$$#family_and_domains" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$"\ visibility dense\ unipModif AA Modifications bigBed 12 + UniProt Amino Acid Modifications 1 9 0 0 0 127 127 127 0 0 0 genes 1 bigDataUrl /gbdb/aptMan1/uniprot/unipModif.bb\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ longLabel UniProt Amino Acid Modifications\ parent uniprot\ priority 9\ shortLabel AA Modifications\ track unipModif\ type bigBed 12 +\ urls uniProtId="http://www.uniprot.org/uniprot/$$#aaMod_section" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$"\ visibility dense\ unipMut Mutations bigBed 12 + UniProt Amino Acid Mutations 1 10 0 0 0 127 127 127 0 0 0 genes 1 bigDataUrl /gbdb/aptMan1/uniprot/unipMut.bb\ longLabel UniProt Amino Acid Mutations\ parent uniprot\ priority 10\ shortLabel Mutations\ track unipMut\ type bigBed 12 +\ urls uniProtId="http://www.uniprot.org/uniprot/$$#pathology_and_biotech" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$" variationId="http://www.uniprot.org/uniprot/$$"\ visibility dense\ transMapEnsemblV5 TransMap Ensembl bigPsl TransMap Ensembl and GENCODE Mappings Version 5 3 10.001 0 100 0 127 177 127 0 0 0

Description

\ \

\ This track contains GENCODE or Ensembl alignments produced by\ the TransMap cross-species alignment algorithm from other vertebrate\ species in the UCSC Genome Browser. GENCODE is Ensembl for human and mouse,\ for other Ensembl sources, only ones with full gene builds are used.\ Projection Ensembl gene annotations will not be used as sources.\ For closer evolutionary distances, the alignments are created using\ syntenically filtered BLASTZ alignment chains, resulting in a prediction of the\ orthologous genes in brown kiwi.\

\ \ \

Display Conventions and Configuration

\ \

\ This track follows the display conventions for \ PSL alignment tracks.

\

\ This track may also be configured to display codon coloring, a feature that\ allows the user to quickly compare cDNAs against the genomic sequence. For more \ information about this option, click \ here.\ Several types of alignment gap may also be colored; \ for more information, click \ here.\ \

Methods

\ \

\

    \
  1. Source transcript alignments were obtained from vertebrate organisms\ in the UCSC Genome Browser Database. BLAT alignments of RefSeq Genes, GenBank \ mRNAs, and GenBank Spliced ESTs to the cognate genome, along with UCSC Genes,\ were used as available.\
  2. For all vertebrate assemblies that had BLASTZ alignment chains and\ nets to the brown kiwi (aptMan1) genome, a subset of the alignment chains were\ selected as follows:\ \
  3. The pslMap program was used to do a base-level projection of\ the source transcript alignments via the selected chains\ to the brown kiwi genome, resulting in pairwise alignments of the source transcripts to\ the genome.\
  4. The resulting alignments were filtered with pslCDnaFilter\ with a global near-best criteria of 0.5% in finished genomes\ (human and mouse) and 1.0% in other genomes. Alignments\ where less than 20% of the transcript mapped were discarded.\
\

\ \

\ To ensure unique identifiers for each alignment, cDNA and gene accessions were\ made unique by appending a suffix for each location in the source genome and\ again for each mapped location in the destination genome. The format is:\

\
   accession.version-srcUniq.destUniq\
\ \ Where srcUniq is a number added to make each source alignment unique, and\ destUniq is added to give the subsequent TransMap alignments unique\ identifiers.\

\

\ For example, in the cow genome, there are two alignments of mRNA BC149621.1.\ These are assigned the identifiers BC149621.1-1 and BC149621.1-2.\ When these are mapped to the human genome, BC149621.1-1 maps to a single\ location and is given the identifier BC149621.1-1.1. However, BC149621.1-2\ maps to two locations, resulting in BC149621.1-2.1 and BC149621.1-2.2. Note\ that multiple TransMap mappings are usually the result of tandem duplications, where both\ chains are identified as syntenic.\

\ \

Data Access

\ \

\ The raw data for these tracks can be accessed interactively through the\ Table Browser or the\ Data Integrator.\ For automated analysis, the annotations are stored in\ bigPsl files (containing a\ number of extra columns) and can be downloaded from our\ download server, \ or queried using our API. For more \ information on accessing track data see our \ Track Data Access FAQ.\ The files are associated with these tracks in the following way:\

\ Individual regions or the whole genome annotation can be obtained using our tool\ bigBedToBed which can be compiled from the source code or downloaded as\ a precompiled binary for your system. Instructions for downloading source code and\ binaries can be found\ here.\ The tool can also be used to obtain only features within a given range, for example:\

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/aptMan1/transMap/V4/aptMan1.refseq.transMapV4.bigPsl\ -chrom=chr6 -start=0 -end=1000000 stdout\ \ \

Credits

\ \

\ This track was produced by Mark Diekhans at UCSC from cDNA and EST sequence data\ submitted to the international public sequence databases by \ scientists worldwide and annotations produced by the RefSeq,\ Ensembl, and GENCODE annotations projects.

\ \

References

\

\ Siepel A, Diekhans M, Brejová B, Langton L, Stevens M, Comstock CL, Davis C, Ewing B, Oommen S,\ Lau C et al.\ \ Targeted discovery of novel human exons by comparative genomics.\ Genome Res. 2007 Dec;17(12):1763-73.\ PMID: 17989246; PMC: PMC2099585\

\ \

\ Stanke M, Diekhans M, Baertsch R, Haussler D.\ \ Using native and syntenically mapped cDNA alignments to improve de novo gene finding.\ Bioinformatics. 2008 Mar 1;24(5):637-44.\ PMID: 18218656\

\ \

\ Zhu J, Sanborn JZ, Diekhans M, Lowe CB, Pringle TH, Haussler D.\ \ Comparative genomics search for losses of long-established genes on the human lineage.\ PLoS Comput Biol. 2007 Dec;3(12):e247.\ PMID: 18085818; PMC: PMC2134963\

\ \ genes 1 baseColorDefault diffCodons\ baseColorUseCds given\ baseColorUseSequence lfExtra\ bigDataUrl /gbdb/aptMan1/transMap/V5/aptMan1.ensembl.transMapV5.bigPsl\ canPack on\ color 0,100,0\ defaultLabelFields orgAbbrev,geneName\ group genes\ html transMapEnsembl\ indelDoubleInsert on\ indelQueryInsert on\ labelFields commonName,orgAbbrev,srcDb,srcTransId,name,geneName,geneId,geneType,transcriptType\ labelSeparator " "\ longLabel TransMap Ensembl and GENCODE Mappings Version 5\ priority 10.001\ searchIndex name,srcTransId,geneName,geneId\ shortLabel TransMap Ensembl\ showCdsAllScales .\ showCdsMaxZoom 10000.0\ showDiffBasesAllScales .\ showDiffBasesMaxZoom 10000.0\ superTrack transMapV5 pack\ track transMapEnsemblV5\ transMapSrcSet ensembl\ type bigPsl\ visibility pack\ transMapRefSeqV5 TransMap RefGene bigPsl TransMap RefSeq Gene Mappings Version 5 3 10.003 0 100 0 127 177 127 0 0 0

Description

\ \

\ This track contains RefSeq Gene alignments produced by\ the TransMap cross-species alignment algorithm\ from other vertebrate species in the UCSC Genome Browser.\ For closer evolutionary distances, the alignments are created using\ syntenically filtered BLASTZ alignment chains, resulting in a prediction of the\ orthologous genes in brown kiwi.\

\ \ \

Display Conventions and Configuration

\ \

\ This track follows the display conventions for \ PSL alignment tracks.

\

\ This track may also be configured to display codon coloring, a feature that\ allows the user to quickly compare cDNAs against the genomic sequence. For more \ information about this option, click \ here.\ Several types of alignment gap may also be colored; \ for more information, click \ here.\ \

Methods

\ \

\

    \
  1. Source transcript alignments were obtained from vertebrate organisms\ in the UCSC Genome Browser Database. BLAT alignments of RefSeq Genes, GenBank \ mRNAs, and GenBank Spliced ESTs to the cognate genome, along with UCSC Genes,\ were used as available.\
  2. For all vertebrate assemblies that had BLASTZ alignment chains and\ nets to the brown kiwi (aptMan1) genome, a subset of the alignment chains were\ selected as follows:\ \
  3. The pslMap program was used to do a base-level projection of\ the source transcript alignments via the selected chains\ to the brown kiwi genome, resulting in pairwise alignments of the source transcripts to\ the genome.\
  4. The resulting alignments were filtered with pslCDnaFilter\ with a global near-best criteria of 0.5% in finished genomes\ (human and mouse) and 1.0% in other genomes. Alignments\ where less than 20% of the transcript mapped were discarded.\
\

\ \

\ To ensure unique identifiers for each alignment, cDNA and gene accessions were\ made unique by appending a suffix for each location in the source genome and\ again for each mapped location in the destination genome. The format is:\

\
   accession.version-srcUniq.destUniq\
\ \ Where srcUniq is a number added to make each source alignment unique, and\ destUniq is added to give the subsequent TransMap alignments unique\ identifiers.\

\

\ For example, in the cow genome, there are two alignments of mRNA BC149621.1.\ These are assigned the identifiers BC149621.1-1 and BC149621.1-2.\ When these are mapped to the human genome, BC149621.1-1 maps to a single\ location and is given the identifier BC149621.1-1.1. However, BC149621.1-2\ maps to two locations, resulting in BC149621.1-2.1 and BC149621.1-2.2. Note\ that multiple TransMap mappings are usually the result of tandem duplications, where both\ chains are identified as syntenic.\

\ \

Data Access

\ \

\ The raw data for these tracks can be accessed interactively through the\ Table Browser or the\ Data Integrator.\ For automated analysis, the annotations are stored in\ bigPsl files (containing a\ number of extra columns) and can be downloaded from our\ download server, \ or queried using our API. For more \ information on accessing track data see our \ Track Data Access FAQ.\ The files are associated with these tracks in the following way:\

\ Individual regions or the whole genome annotation can be obtained using our tool\ bigBedToBed which can be compiled from the source code or downloaded as\ a precompiled binary for your system. Instructions for downloading source code and\ binaries can be found\ here.\ The tool can also be used to obtain only features within a given range, for example:\

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/aptMan1/transMap/V4/aptMan1.refseq.transMapV4.bigPsl\ -chrom=chr6 -start=0 -end=1000000 stdout\ \ \

Credits

\ \

\ This track was produced by Mark Diekhans at UCSC from cDNA and EST sequence data\ submitted to the international public sequence databases by \ scientists worldwide and annotations produced by the RefSeq,\ Ensembl, and GENCODE annotations projects.

\ \

References

\

\ Siepel A, Diekhans M, Brejová B, Langton L, Stevens M, Comstock CL, Davis C, Ewing B, Oommen S,\ Lau C et al.\ \ Targeted discovery of novel human exons by comparative genomics.\ Genome Res. 2007 Dec;17(12):1763-73.\ PMID: 17989246; PMC: PMC2099585\

\ \

\ Stanke M, Diekhans M, Baertsch R, Haussler D.\ \ Using native and syntenically mapped cDNA alignments to improve de novo gene finding.\ Bioinformatics. 2008 Mar 1;24(5):637-44.\ PMID: 18218656\

\ \

\ Zhu J, Sanborn JZ, Diekhans M, Lowe CB, Pringle TH, Haussler D.\ \ Comparative genomics search for losses of long-established genes on the human lineage.\ PLoS Comput Biol. 2007 Dec;3(12):e247.\ PMID: 18085818; PMC: PMC2134963\

\ \ genes 1 baseColorDefault diffCodons\ baseColorUseCds given\ baseColorUseSequence lfExtra\ bigDataUrl /gbdb/aptMan1/transMap/V5/aptMan1.refseq.transMapV5.bigPsl\ canPack on\ color 0,100,0\ defaultLabelFields orgAbbrev,geneName\ group genes\ html transMapRefSeq\ indelDoubleInsert on\ indelQueryInsert on\ labelFields commonName,orgAbbrev,srcDb,srcTransId,name,geneName,geneId\ labelSeparator " "\ longLabel TransMap RefSeq Gene Mappings Version 5\ priority 10.003\ searchIndex name,srcTransId,geneName,geneId\ shortLabel TransMap RefGene\ showCdsAllScales .\ showCdsMaxZoom 10000.0\ showDiffBasesAllScales .\ showDiffBasesMaxZoom 10000.0\ superTrack transMapV5 pack\ track transMapRefSeqV5\ transMapSrcSet refseq\ type bigPsl\ visibility pack\ transMapRnaV5 TransMap RNA bigPsl TransMap GenBank RNA Mappings Version 5 0 10.004 0 100 0 127 177 127 0 0 0

Description

\ \

\ This track contains GenBank mRNA alignments produced by\ the TransMap cross-species alignment algorithm\ from other vertebrate species in the UCSC Genome Browser.\ For closer evolutionary distances, the alignments are created using\ syntenically filtered BLASTZ alignment chains, resulting in a prediction of the\ orthologous genes in brown kiwi.\

\ \ \

Display Conventions and Configuration

\ \

\ This track follows the display conventions for \ PSL alignment tracks.

\

\ This track may also be configured to display codon coloring, a feature that\ allows the user to quickly compare cDNAs against the genomic sequence. For more \ information about this option, click \ here.\ Several types of alignment gap may also be colored; \ for more information, click \ here.\ \

Methods

\ \

\

    \
  1. Source transcript alignments were obtained from vertebrate organisms\ in the UCSC Genome Browser Database. BLAT alignments of RefSeq Genes, GenBank \ mRNAs, and GenBank Spliced ESTs to the cognate genome, along with UCSC Genes,\ were used as available.\
  2. For all vertebrate assemblies that had BLASTZ alignment chains and\ nets to the brown kiwi (aptMan1) genome, a subset of the alignment chains were\ selected as follows:\ \
  3. The pslMap program was used to do a base-level projection of\ the source transcript alignments via the selected chains\ to the brown kiwi genome, resulting in pairwise alignments of the source transcripts to\ the genome.\
  4. The resulting alignments were filtered with pslCDnaFilter\ with a global near-best criteria of 0.5% in finished genomes\ (human and mouse) and 1.0% in other genomes. Alignments\ where less than 20% of the transcript mapped were discarded.\
\

\ \

\ To ensure unique identifiers for each alignment, cDNA and gene accessions were\ made unique by appending a suffix for each location in the source genome and\ again for each mapped location in the destination genome. The format is:\

\
   accession.version-srcUniq.destUniq\
\ \ Where srcUniq is a number added to make each source alignment unique, and\ destUniq is added to give the subsequent TransMap alignments unique\ identifiers.\

\

\ For example, in the cow genome, there are two alignments of mRNA BC149621.1.\ These are assigned the identifiers BC149621.1-1 and BC149621.1-2.\ When these are mapped to the human genome, BC149621.1-1 maps to a single\ location and is given the identifier BC149621.1-1.1. However, BC149621.1-2\ maps to two locations, resulting in BC149621.1-2.1 and BC149621.1-2.2. Note\ that multiple TransMap mappings are usually the result of tandem duplications, where both\ chains are identified as syntenic.\

\ \

Data Access

\ \

\ The raw data for these tracks can be accessed interactively through the\ Table Browser or the\ Data Integrator.\ For automated analysis, the annotations are stored in\ bigPsl files (containing a\ number of extra columns) and can be downloaded from our\ download server, \ or queried using our API. For more \ information on accessing track data see our \ Track Data Access FAQ.\ The files are associated with these tracks in the following way:\

\ Individual regions or the whole genome annotation can be obtained using our tool\ bigBedToBed which can be compiled from the source code or downloaded as\ a precompiled binary for your system. Instructions for downloading source code and\ binaries can be found\ here.\ The tool can also be used to obtain only features within a given range, for example:\

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/aptMan1/transMap/V4/aptMan1.refseq.transMapV4.bigPsl\ -chrom=chr6 -start=0 -end=1000000 stdout\ \ \

Credits

\ \

\ This track was produced by Mark Diekhans at UCSC from cDNA and EST sequence data\ submitted to the international public sequence databases by \ scientists worldwide and annotations produced by the RefSeq,\ Ensembl, and GENCODE annotations projects.

\ \

References

\

\ Siepel A, Diekhans M, Brejová B, Langton L, Stevens M, Comstock CL, Davis C, Ewing B, Oommen S,\ Lau C et al.\ \ Targeted discovery of novel human exons by comparative genomics.\ Genome Res. 2007 Dec;17(12):1763-73.\ PMID: 17989246; PMC: PMC2099585\

\ \

\ Stanke M, Diekhans M, Baertsch R, Haussler D.\ \ Using native and syntenically mapped cDNA alignments to improve de novo gene finding.\ Bioinformatics. 2008 Mar 1;24(5):637-44.\ PMID: 18218656\

\ \

\ Zhu J, Sanborn JZ, Diekhans M, Lowe CB, Pringle TH, Haussler D.\ \ Comparative genomics search for losses of long-established genes on the human lineage.\ PLoS Comput Biol. 2007 Dec;3(12):e247.\ PMID: 18085818; PMC: PMC2134963\

\ \ genes 1 baseColorDefault diffCodons\ baseColorUseCds given\ baseColorUseSequence lfExtra\ bigDataUrl /gbdb/aptMan1/transMap/V5/aptMan1.rna.transMapV5.bigPsl\ canPack on\ color 0,100,0\ defaultLabelFields orgAbbrev,srcTransId\ group genes\ html transMapRna\ indelDoubleInsert on\ indelQueryInsert on\ labelFields commonName,orgAbbrev,srcDb,srcTransId,name,geneName\ labelSeparator " "\ longLabel TransMap GenBank RNA Mappings Version 5\ priority 10.004\ searchIndex name,srcTransId,geneName\ shortLabel TransMap RNA\ showCdsAllScales .\ showCdsMaxZoom 10000.0\ showDiffBasesAllScales .\ showDiffBasesMaxZoom 10000.0\ superTrack transMapV5 hide\ track transMapRnaV5\ transMapSrcSet rna\ type bigPsl\ visibility hide\ transMapEstV5 TransMap ESTs bigPsl TransMap EST Mappings Version 5 0 10.005 0 100 0 127 177 127 0 0 0

Description

\ \

\ This track contains GenBank spliced EST alignments produced by\ the TransMap cross-species alignment algorithm\ from other vertebrate species in the UCSC Genome Browser.\ For closer evolutionary distances, the alignments are created using\ syntenically filtered BLASTZ alignment chains, resulting in a prediction of the\ orthologous genes in brown kiwi.\

\ \ \

Display Conventions and Configuration

\ \

\ This track follows the display conventions for \ PSL alignment tracks.

\

\ This track may also be configured to display codon coloring, a feature that\ allows the user to quickly compare cDNAs against the genomic sequence. For more \ information about this option, click \ here.\ Several types of alignment gap may also be colored; \ for more information, click \ here.\ \

Methods

\ \

\

    \
  1. Source transcript alignments were obtained from vertebrate organisms\ in the UCSC Genome Browser Database. BLAT alignments of RefSeq Genes, GenBank \ mRNAs, and GenBank Spliced ESTs to the cognate genome, along with UCSC Genes,\ were used as available.\
  2. For all vertebrate assemblies that had BLASTZ alignment chains and\ nets to the brown kiwi (aptMan1) genome, a subset of the alignment chains were\ selected as follows:\ \
  3. The pslMap program was used to do a base-level projection of\ the source transcript alignments via the selected chains\ to the brown kiwi genome, resulting in pairwise alignments of the source transcripts to\ the genome.\
  4. The resulting alignments were filtered with pslCDnaFilter\ with a global near-best criteria of 0.5% in finished genomes\ (human and mouse) and 1.0% in other genomes. Alignments\ where less than 20% of the transcript mapped were discarded.\
\

\ \

\ To ensure unique identifiers for each alignment, cDNA and gene accessions were\ made unique by appending a suffix for each location in the source genome and\ again for each mapped location in the destination genome. The format is:\

\
   accession.version-srcUniq.destUniq\
\ \ Where srcUniq is a number added to make each source alignment unique, and\ destUniq is added to give the subsequent TransMap alignments unique\ identifiers.\

\

\ For example, in the cow genome, there are two alignments of mRNA BC149621.1.\ These are assigned the identifiers BC149621.1-1 and BC149621.1-2.\ When these are mapped to the human genome, BC149621.1-1 maps to a single\ location and is given the identifier BC149621.1-1.1. However, BC149621.1-2\ maps to two locations, resulting in BC149621.1-2.1 and BC149621.1-2.2. Note\ that multiple TransMap mappings are usually the result of tandem duplications, where both\ chains are identified as syntenic.\

\ \

Data Access

\ \

\ The raw data for these tracks can be accessed interactively through the\ Table Browser or the\ Data Integrator.\ For automated analysis, the annotations are stored in\ bigPsl files (containing a\ number of extra columns) and can be downloaded from our\ download server, \ or queried using our API. For more \ information on accessing track data see our \ Track Data Access FAQ.\ The files are associated with these tracks in the following way:\

\ Individual regions or the whole genome annotation can be obtained using our tool\ bigBedToBed which can be compiled from the source code or downloaded as\ a precompiled binary for your system. Instructions for downloading source code and\ binaries can be found\ here.\ The tool can also be used to obtain only features within a given range, for example:\

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/aptMan1/transMap/V4/aptMan1.refseq.transMapV4.bigPsl\ -chrom=chr6 -start=0 -end=1000000 stdout\ \ \

Credits

\ \

\ This track was produced by Mark Diekhans at UCSC from cDNA and EST sequence data\ submitted to the international public sequence databases by \ scientists worldwide and annotations produced by the RefSeq,\ Ensembl, and GENCODE annotations projects.

\ \

References

\

\ Siepel A, Diekhans M, Brejová B, Langton L, Stevens M, Comstock CL, Davis C, Ewing B, Oommen S,\ Lau C et al.\ \ Targeted discovery of novel human exons by comparative genomics.\ Genome Res. 2007 Dec;17(12):1763-73.\ PMID: 17989246; PMC: PMC2099585\

\ \

\ Stanke M, Diekhans M, Baertsch R, Haussler D.\ \ Using native and syntenically mapped cDNA alignments to improve de novo gene finding.\ Bioinformatics. 2008 Mar 1;24(5):637-44.\ PMID: 18218656\

\ \

\ Zhu J, Sanborn JZ, Diekhans M, Lowe CB, Pringle TH, Haussler D.\ \ Comparative genomics search for losses of long-established genes on the human lineage.\ PLoS Comput Biol. 2007 Dec;3(12):e247.\ PMID: 18085818; PMC: PMC2134963\

\ \ genes 1 baseColorDefault none\ baseColorUseSequence lfExtra\ bigDataUrl /gbdb/aptMan1/transMap/V5/aptMan1.est.transMapV5.bigPsl\ canPack on\ color 0,100,0\ defaultLabelFields orgAbbrev,srcTransId\ group genes\ html transMapEst\ indelDoubleInsert on\ indelQueryInsert on\ labelFields commonName,orgAbbrev,srcDb,srcTransId,name\ labelSeparator " "\ longLabel TransMap EST Mappings Version 5\ priority 10.005\ searchIndex name,srcTransId\ shortLabel TransMap ESTs\ showDiffBasesAllScales .\ showDiffBasesMaxZoom 10000.0\ superTrack transMapV5 hide\ track transMapEstV5\ transMapSrcSet est\ type bigPsl\ visibility hide\ unipOther Other Annot. bigBed 12 + UniProt Other Annotations 1 11 0 0 0 127 127 127 0 0 0 genes 1 bigDataUrl /gbdb/aptMan1/uniprot/unipOther.bb\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ longLabel UniProt Other Annotations\ parent uniprot\ priority 11\ shortLabel Other Annot.\ track unipOther\ type bigBed 12 +\ urls uniProtId="http://www.uniprot.org/uniprot/$$#family_and_domains" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$"\ visibility dense\ unipStruct Structure bigBed 12 + UniProt Protein Primary/Secondary Structure Annotations 0 11 0 0 0 127 127 127 0 0 0 genes 1 bigDataUrl /gbdb/aptMan1/uniprot/unipStruct.bb\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ group genes\ longLabel UniProt Protein Primary/Secondary Structure Annotations\ parent uniprot\ priority 11\ shortLabel Structure\ track unipStruct\ type bigBed 12 +\ urls uniProtId="http://www.uniprot.org/uniprot/$$#structure" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$"\ visibility hide\ unipRepeat Repeats bigBed 12 + UniProt Repeats 1 12 0 0 0 127 127 127 0 0 0 genes 1 bigDataUrl /gbdb/aptMan1/uniprot/unipRepeat.bb\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ longLabel UniProt Repeats\ parent uniprot\ priority 12\ shortLabel Repeats\ track unipRepeat\ type bigBed 12 +\ urls uniProtId="http://www.uniprot.org/uniprot/$$#family_and_domains" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$"\ visibility dense\ unipConflict Seq. Conflicts bigBed 12 + UniProt Sequence Conflicts 1 13 0 0 0 127 127 127 0 0 0 genes 1 bigDataUrl /gbdb/aptMan1/uniprot/unipConflict.bb\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ longLabel UniProt Sequence Conflicts\ parent uniprot off\ priority 13\ shortLabel Seq. Conflicts\ track unipConflict\ type bigBed 12 +\ urls uniProtId="http://www.uniprot.org/uniprot/$$#Sequence_conflict_section" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$"\ visibility dense\ mrna Brown kiwi mRNAs psl . Brown kiwi mRNAs from GenBank 3 100 0 0 0 127 127 127 0 0 0

Description

\ \

\ The mRNA track shows alignments between brown kiwi mRNAs\ in \ GenBank and the genome.

\ \

Display Conventions and Configuration

\ \

\ This track follows the display conventions for\ \ PSL alignment tracks. In dense display mode, the items that\ are more darkly shaded indicate matches of better quality.\

\ \

\ The description page for this track has a filter that can be used to change\ the display mode, alter the color, and include/exclude a subset of items\ within the track. This may be helpful when many items are shown in the track\ display, especially when only some are relevant to the current task.\

\ \

\ To use the filter:\

    \
  1. Type a term in one or more of the text boxes to filter the mRNA\ display. For example, to apply the filter to all mRNAs expressed in a specific\ organ, type the name of the organ in the tissue box. To view the list of\ valid terms for each text box, consult the table in the Table Browser that\ corresponds to the factor on which you wish to filter. For example, the\ "tissue" table contains all the types of tissues that can be\ entered into the tissue text box. Multiple terms may be entered at once,\ separated by a space. Wildcards may also be used in the filter.
  2. \
  3. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only mRNAs that match all filter\ criteria will be highlighted. If "or" is selected, mRNAs that\ match any one of the filter criteria will be highlighted.
  4. \
  5. Choose the color or display characteristic that should be used to\ highlight or include/exclude the filtered items. If "exclude" is\ chosen, the browser will not display mRNAs that match the filter criteria.\ If "include" is selected, the browser will display only those\ mRNAs that match the filter criteria.
  6. \
\

\ \

\ This track may also be configured to display codon coloring, a feature that\ allows the user to quickly compare mRNAs against the genomic sequence. For more\ information about this option, go to the\ \ Codon and Base Coloring for Alignment Tracks page.\ Several types of alignment gap may also be colored;\ for more information, go to the\ \ Alignment Insertion/Deletion Display Options page.\

\ \

Methods

\ \

\ GenBank brown kiwi mRNAs were aligned against the genome using the\ blat program. When a single mRNA aligned in multiple places,\ the alignment having the highest base identity was found.\ Only alignments having a base identity level within 0.5% of\ the best and at least 96% base identity with the genomic sequence were kept.\

\ \

Credits

\ \

\ The mRNA track was produced at UCSC from mRNA sequence data\ submitted to the international public sequence databases by\ scientists worldwide.\

\ \

References

\

\ Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW.\ \ GenBank.\ Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42.\ PMID: 23193287; PMC: PMC3531190\

\ \

\ Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL.\ GenBank: update.\ Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6.\ PMID: 14681350; PMC: PMC308779\

\ \

\ Kent WJ.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.\ PMID: 11932250; PMC: PMC187518\

\ rna 1 baseColorDefault diffCodons\ baseColorUseCds genbank\ baseColorUseSequence genbank\ group rna\ indelDoubleInsert on\ indelPolyA on\ indelQueryInsert on\ longLabel Brown kiwi mRNAs from GenBank\ shortLabel Brown kiwi mRNAs\ showDiffBasesAllScales .\ table all_mrna\ track mrna\ type psl .\ visibility pack\ gold Assembly bed 3 + Assembly from Fragments 0 100 150 100 30 230 170 40 0 0 0

Description

\

\ This track shows the sequences used in the Jun. 2015 brown kiwi genome assembly.\

\

\ Genome assembly procedures are covered in the NCBI\ assembly documentation.
\ NCBI also provides\ specific information about this assembly.\

\

\ The definition of this assembly is from the\ AGP file delivered with the sequence. The NCBI document\ AGP Specification describes the format of the AGP file.\

\

\ In dense mode, this track depicts the contigs that make up the \ currently viewed scaffold. \ Contig boundaries are distinguished by the use of alternating gold and brown \ coloration. Where gaps\ exist between contigs, spaces are shown between the gold and brown\ blocks. The relative order and orientation of the contigs\ within a scaffold is always known; therefore, a line is drawn in the graphical\ display to bridge the blocks.

\

\ Component types found in this track (with counts of that type in parentheses):\

\ map 1 altColor 230,170,40\ color 150,100,30\ group map\ html gold\ longLabel Assembly from Fragments\ shortLabel Assembly\ track gold\ type bed 3 +\ visibility hide\ augustusGene AUGUSTUS genePred AUGUSTUS ab initio gene predictions v3.1 0 100 12 105 0 133 180 127 0 0 0

Description

\ \

\ This track shows ab initio predictions from the program\ AUGUSTUS (version 3.1).\ The predictions are based on the genome sequence alone.\

\ \

\ For more information on the different gene tracks, see our Genes FAQ.

\ \

Methods

\ \

\ Statistical signal models were built for splice sites, branch-point\ patterns, translation start sites, and the poly-A signal.\ Furthermore, models were built for the sequence content of\ protein-coding and non-coding regions as well as for the length distributions\ of different exon and intron types. Detailed descriptions of most of these different models\ can be found in Mario Stanke's\ dissertation.\ This track shows the most likely gene structure according to a\ Semi-Markov Conditional Random Field model.\ Alternative splicing transcripts were obtained with\ a sampling algorithm (--alternatives-from-sampling=true --sample=100 --minexonintronprob=0.2\ --minmeanexonintronprob=0.5 --maxtracks=3 --temperature=2).\

\ \

\ The different models used by Augustus were trained on a number of different species-specific\ gene sets, which included 1000-2000 training gene structures. The --species option allows\ one to choose the species used for training the models. Different training species were used\ for the --species option when generating these predictions for different groups of\ assemblies.\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
Assembly GroupTraining Species
Fishzebrafish\ \
Birdschicken\ \
Human and all other vertebrateshuman\ \
Nematodescaenorhabditis
Drosophilafly
A. melliferahoneybee1
A. gambiaeculex
S. cerevisiaesaccharomyces
\

\ This table describes which training species was used for a particular group of assemblies.\ When available, the closest related training species was used.\

\ \

Credits

\ \ Thanks to the\ Stanke lab\ for providing the AUGUSTUS program. The training for the chicken version was\ done by Stefanie König and the training for the\ human and zebrafish versions was done by Mario Stanke.\ \

References

\ \

\ Stanke M, Diekhans M, Baertsch R, Haussler D.\ \ Using native and syntenically mapped cDNA alignments to improve de novo gene finding.\ Bioinformatics. 2008 Mar 1;24(5):637-44.\ PMID: 18218656\

\ \

\ Stanke M, Waack S.\ \ Gene prediction with a hidden Markov model and a new intron submodel.\ Bioinformatics. 2003 Oct;19 Suppl 2:ii215-25.\ PMID: 14534192\

\ genes 1 baseColorDefault genomicCodons\ baseColorUseCds given\ color 12,105,0\ group genes\ longLabel AUGUSTUS ab initio gene predictions v3.1\ shortLabel AUGUSTUS\ track augustusGene\ type genePred\ visibility hide\ cytoBandIdeo Chromosome Band (Ideogram) bed 4 + Ideogram for Orientation 1 100 0 0 0 127 127 127 0 0 0 map 1 group map\ longLabel Ideogram for Orientation\ shortLabel Chromosome Band (Ideogram)\ track cytoBandIdeo\ type bed 4 +\ visibility dense\ cpgIslandSuper CpG Islands bed 4 + CpG Islands (Islands < 300 Bases are Light Green) 0 100 0 100 0 128 228 128 0 0 0

Description

\ \

CpG islands are associated with genes, particularly housekeeping\ genes, in vertebrates. CpG islands are typically common near\ transcription start sites and may be associated with promoter\ regions. Normally a C (cytosine) base followed immediately by a \ G (guanine) base (a CpG) is rare in\ vertebrate DNA because the Cs in such an arrangement tend to be\ methylated. This methylation helps distinguish the newly synthesized\ DNA strand from the parent strand, which aids in the final stages of\ DNA proofreading after duplication. However, over evolutionary time,\ methylated Cs tend to turn into Ts because of spontaneous\ deamination. The result is that CpGs are relatively rare unless\ there is selective pressure to keep them or a region is not methylated\ for some other reason, perhaps having to do with the regulation of gene\ expression. CpG islands are regions where CpGs are present at\ significantly higher levels than is typical for the genome as a whole.

\ \

\ The unmasked version of the track displays potential CpG islands\ that exist in repeat regions and would otherwise not be visible\ in the repeat masked version.\

\ \

\ By default, only the masked version of the track is displayed. To view the\ unmasked version, change the visibility settings in the track controls at\ the top of this page.\

\ \

Methods

\ \

CpG islands were predicted by searching the sequence one base at a\ time, scoring each dinucleotide (+17 for CG and -1 for others) and\ identifying maximally scoring segments. Each segment was then\ evaluated for the following criteria:\ \

\

\

\ The entire genome sequence, masking areas included, was\ used for the construction of the track Unmasked CpG.\ The track CpG Islands is constructed on the sequence after\ all masked sequence is removed.\

\ \

The CpG count is the number of CG dinucleotides in the island. \ The Percentage CpG is the ratio of CpG nucleotide bases\ (twice the CpG count) to the length. The ratio of observed to expected \ CpG is calculated according to the formula (cited in \ Gardiner-Garden et al. (1987)):\ \

    Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G)
\ \ where N = length of sequence.

\

\ The calculation of the track data is performed by the following command sequence:\

\
twoBitToFa assembly.2bit stdout | maskOutFa stdin hard stdout \\\
  | cpg_lh /dev/stdin 2> cpg_lh.err \\\
    |  awk '{$2 = $2 - 1; width = $3 - $2;  printf("%s\\t%d\\t%s\\t%s %s\\t%s\\t%s\\t%0.0f\\t%0.1f\\t%s\\t%s\\n", $1, $2, $3, $5, $6, width, $6, width*$7*0.01, 100.0*2*$6/width, $7, $9);}' \\\
     | sort -k1,1 -k2,2n > cpgIsland.bed\
\ The unmasked track data is constructed from\ twoBitToFa -noMask output for the twoBitToFa command.\

\ \

Data access

\

\ CpG islands and its associated tables can be explored interactively using the\ REST API, the\ Table Browser or the\ Data Integrator.\ All the tables can also be queried directly from our public MySQL\ servers, with more information available on our\ help page as well as on\ our blog.

\

\ The source for the cpg_lh program can be obtained from\ src/utils/cpgIslandExt/.\ The cpg_lh program binary can be obtained from: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/cpg_lh (choose "save file")\

\ \

Credits

\ \

This track was generated using a modification of a program developed by G. Miklem and L. Hillier \ (unpublished).

\ \

References

\ \

\ Gardiner-Garden M, Frommer M.\ \ CpG islands in vertebrate genomes.\ J Mol Biol. 1987 Jul 20;196(2):261-82.\ PMID: 3656447\

\ regulation 1 altColor 128,228,128\ color 0,100,0\ group regulation\ html cpgIslandSuper\ longLabel CpG Islands (Islands < 300 Bases are Light Green)\ shortLabel CpG Islands\ superTrack on\ track cpgIslandSuper\ type bed 4 +\ gap Gap bed 3 + Gap Locations 1 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track shows the gaps in the Jun. 2015 brown kiwi genome assembly.\

\

\ Genome assembly procedures are covered in the NCBI\ assembly documentation.
\ NCBI also provides\ specific information about this assembly.\

\

\ The definition of the gaps in this assembly is from the\ AGP file delivered with the sequence. The NCBI document\ AGP Specification describes the format of the AGP file.\

\

\ Gaps are represented as black boxes in this track.\ If the relative order and orientation of the contigs on either side\ of the gap is supported by read pair data, \ it is a bridged gap and a white line is drawn \ through the black box representing the gap. \

\

This assembly contains the following principal types of gaps:\

\ map 1 group map\ html gap\ longLabel Gap Locations\ shortLabel Gap\ track gap\ type bed 3 +\ visibility dense\ gc5BaseBw GC Percent bigWig 0 100 GC Percent in 5-Base Windows 0 100 0 0 0 128 128 128 0 0 0

Description

\

\ The GC percent track shows the percentage of G (guanine) and C (cytosine) bases\ in 5-base windows. High GC content is typically associated with\ gene-rich areas.\

\

\ This track may be configured in a variety of ways to highlight different\ apsects of the displayed information. Click the\ "Graph configuration help"\ link for an explanation of the configuration options.\ \

Credits

\

The data and presentation of this graph were prepared by\ Hiram Clawson.\

\ \ map 0 altColor 128,128,128\ autoScale Off\ color 0,0,0\ graphTypeDefault Bar\ gridDefault OFF\ group map\ html gc5Base\ longLabel GC Percent in 5-Base Windows\ maxHeightPixels 128:36:16\ shortLabel GC Percent\ track gc5BaseBw\ type bigWig 0 100\ viewLimits 30:70\ visibility hide\ windowingFunction Mean\ genscan Genscan Genes genePred genscanPep Genscan Gene Predictions 0 100 170 100 0 212 177 127 0 0 0

Description

\ \

\ This track shows predictions from the\ Genscan program\ written by Chris Burge.\ The predictions are based on transcriptional, translational and donor/acceptor\ splicing signals as well as the length and compositional distributions of exons,\ introns and intergenic regions.\

\ \

\ For more information on the different gene tracks, see our Genes FAQ.

\ \

Display Conventions and Configuration

\ \

\ This track follows the display conventions for\ gene prediction\ tracks.\

\ \

\ The track description page offers the following filter and configuration\ options:\

\

\ \

Methods

\ \

\ For a description of the Genscan program and the model that underlies it,\ refer to Burge and Karlin (1997) in the References section below.\ The splice site models used are described in more detail in Burge (1998)\ below.\

\ \

Credits

\ \ Thanks to Chris Burge for providing the Genscan program.\ \

References

\ \

\ Burge C.\ Modeling Dependencies in Pre-mRNA Splicing Signals.\ In: Salzberg S, Searls D, Kasif S, editors.\ Computational Methods in Molecular Biology.\ Amsterdam: Elsevier Science; 1998. p. 127-163.\

\ \

\ Burge C, Karlin S.\ \ Prediction of complete gene structures in human genomic DNA.\ J. Mol. Biol. 1997 Apr 25;268(1):78-94.\ PMID: 9149143\

\ genes 1 color 170,100,0\ group genes\ longLabel Genscan Gene Predictions\ shortLabel Genscan Genes\ track genscan\ type genePred genscanPep\ visibility hide\ microsat Microsatellite bed 4 Microsatellites - Di-nucleotide and Tri-nucleotide Repeats 0 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track displays regions that are likely to be useful as microsatellite\ markers. These are sequences of at least 15 perfect di-nucleotide and \ tri-nucleotide repeats and tend to be highly polymorphic in the\ population.\

\ \

Methods

\

\ The data shown in this track are a subset of the Simple Repeats track, \ selecting only those \ repeats of period 2 and 3, with 100% identity and no indels and with\ at least 15 copies of the repeat. The Simple Repeats track is\ created using the \ Tandem Repeats Finder. For more information about this \ program, see Benson (1999).

\ \

Credits

\

\ Tandem Repeats Finder was written by \ Gary Benson.

\ \

References

\ \

\ Benson G.\ \ Tandem repeats finder: a program to analyze DNA sequences.\ Nucleic Acids Res. 1999 Jan 15;27(2):573-80.\ PMID: 9862982; PMC: PMC148217\

\ varRep 1 group varRep\ longLabel Microsatellites - Di-nucleotide and Tri-nucleotide Repeats\ shortLabel Microsatellite\ track microsat\ type bed 4\ visibility hide\ xenoRefGene Other RefSeq genePred xenoRefPep xenoRefMrna Non-Brown kiwi RefSeq Genes 1 100 12 12 120 133 133 187 0 0 0

Description

\

\ This track shows known protein-coding and non-protein-coding genes \ for organisms other than brown kiwi, taken from the NCBI RNA reference \ sequences collection (RefSeq). The data underlying this track are \ updated weekly.

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ gene prediction \ tracks.\ The color shading indicates the level of review the RefSeq record has \ undergone: predicted (light), provisional (medium), reviewed (dark).

\

\ The item labels and display colors of features within this track can be\ configured through the controls at the top of the track description page. \

\ \

Methods

\

\ The RNAs were aligned against the brown kiwi genome using blat; those\ with an alignment of less than 15% were discarded. When a single RNA aligned \ in multiple places, the alignment having the highest base identity was \ identified. Only alignments having a base identity level within 0.5% of \ the best and at least 25% base identity with the genomic sequence were kept.\

\ \

Credits

\

\ This track was produced at UCSC from RNA sequence data\ generated by scientists worldwide and curated by the \ NCBI RefSeq project.

\ \

References

\

\ Kent WJ.\ \ BLAT--the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.\ PMID: 11932250; PMC: PMC187518\

\ \

\ Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J,\ Landrum MJ, McGarvey KM et al.\ \ RefSeq: an update on mammalian reference sequences.\ Nucleic Acids Res. 2014 Jan;42(Database issue):D756-63.\ PMID: 24259432; PMC: PMC3965018\

\ \

\ Pruitt KD, Tatusova T, Maglott DR.\ \ NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.\ Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4.\ PMID: 15608248; PMC: PMC539979\

\ genes 1 color 12,12,120\ group genes\ longLabel Non-Brown kiwi RefSeq Genes\ shortLabel Other RefSeq\ track xenoRefGene\ type genePred xenoRefPep xenoRefMrna\ visibility dense\ simpleRepeat Simple Repeats bed 4 + Simple Tandem Repeats by TRF 0 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track displays simple tandem repeats (possibly imperfect repeats) located\ by Tandem Repeats\ Finder (TRF) which is specialized for this purpose. These repeats can\ occur within coding regions of genes and may be quite\ polymorphic. Repeat expansions are sometimes associated with specific\ diseases.

\ \

Methods

\

\ For more information about the TRF program, see Benson (1999).\

\ \

Credits

\

\ TRF was written by \ Gary Benson.

\ \

References

\ \

\ Benson G.\ \ Tandem repeats finder: a program to analyze DNA sequences.\ Nucleic Acids Res. 1999 Jan 15;27(2):573-80.\ PMID: 9862982; PMC: PMC148217\

\ varRep 1 group varRep\ longLabel Simple Tandem Repeats by TRF\ shortLabel Simple Repeats\ track simpleRepeat\ type bed 4 +\ visibility hide\ HLTOGAannotvGalGal6v1 TOGA vs. galGal6 bigBed 12 TOGA annotations using chicken/galGal6 as reference 0 100 0 0 0 127 127 127 0 0 0

Description

\

\ TOGA\ (Tool to infer Orthologs from Genome Alignments)\ is a homology-based method that integrates gene annotation, inferring\ orthologs and classifying genes as intact or lost.\

\ \

Methods

\

\ As input, TOGA uses a gene annotation of a reference species\ (human/hg38 for mammals, chicken/galGal6 for birds) and\ a whole genome alignment between the reference and query genome.\

\

\ TOGA implements a novel paradigm that relies on alignments of intronic\ and intergenic regions and uses machine learning to accurately distinguish\ orthologs from paralogs or processed pseudogenes.\

\

\ To annotate genes,\ CESAR 2.0\ is used to determine the positions and boundaries of coding exons of a\ reference transcript in the orthologous genomic locus in the query species.\

\ \

Display Conventions and Configuration

\

\ Each annotated transcript is shown in a color-coded classification as\

\

\

\ Clicking on a transcript provides additional information about the orthology\ classification, inactivating mutations, the protein sequence and protein/exon\ alignments.\

\ \

Credits

\

\ This data was prepared by the Michael Hiller Lab\

\ \

References

\

\ The TOGA software is available from\ github.com/hillerlab/TOGA\

\ \

\ Kirilenko BM, Munegowda C, Osipova E, Jebb D, Sharma V, Blumer M, Morales AE, Ahmed AW, Kontopoulos\ DG, Hilgers L et al.\ \ Integrating gene annotation with orthology inference at scale.\ Science. 2023 Apr 28;380(6643):eabn3107.\ PMID: 37104600; PMC: PMC10193443\

\ genes 1 bigDataUrl /gbdb/aptMan1/TOGAvGalGal6v1/HLTOGAannotVsgalGal6v1.bb\ group genes\ html TOGAannotation\ itemRgb on\ longLabel TOGA annotations using chicken/galGal6 as reference\ searchIndex name\ searchTrix /gbdb/aptMan1/TOGAvGalGal6v1/HLTOGAannotVsgalGal6v1.ix\ shortLabel TOGA vs. galGal6\ track HLTOGAannotvGalGal6v1\ type bigBed 12\ visibility hide\ transMapV5 TransMap V5 TransMap Alignments Version 5 0 100 0 0 0 127 127 127 0 0 0

Description

\ \

\ These tracks contain cDNA and gene alignments produced by\ the TransMap cross-species alignment algorithm\ from other vertebrate species in the UCSC Genome Browser.\ For closer evolutionary distances, the alignments are created using\ syntenically filtered LASTZ or BLASTZ alignment chains, resulting\ in a prediction of the orthologous genes in brown kiwi. For more distant\ organisms, reciprocal best alignments are used.\

\ \ TransMap maps genes and related annotations in one species to another\ using synteny-filtered pairwise genome alignments (chains and nets) to\ determine the most likely orthologs. For example, for the mRNA TransMap track\ on the human assembly, more than 400,000 mRNAs from 25 vertebrate species were\ aligned at high stringency to the native assembly using BLAT. The alignments\ were then mapped to the human assembly using the chain and net alignments\ produced using BLASTZ, which has higher sensitivity than BLAT for diverged\ organisms.\

\ Compared to translated BLAT, TransMap finds fewer paralogs and aligns more UTR\ bases.\

\ \

Display Conventions and Configuration

\ \

\ This track follows the display conventions for \ PSL alignment tracks.

\

\ This track may also be configured to display codon coloring, a feature that\ allows the user to quickly compare cDNAs against the genomic sequence. For more \ information about this option, click \ here.\ Several types of alignment gap may also be colored; \ for more information, click \ here.\ \

Methods

\ \

\

    \
  1. Source transcript alignments were obtained from vertebrate organisms\ in the UCSC Genome Browser Database. BLAT alignments of RefSeq Genes, GenBank \ mRNAs, and GenBank Spliced ESTs to the cognate genome, along with UCSC Genes,\ were used as available.\
  2. For all vertebrate assemblies that had BLASTZ alignment chains and\ nets to the brown kiwi (aptMan1) genome, a subset of the alignment chains were\ selected as follows:\ \
  3. The pslMap program was used to do a base-level projection of\ the source transcript alignments via the selected chains\ to the brown kiwi genome, resulting in pairwise alignments of the source transcripts to\ the genome.\
  4. The resulting alignments were filtered with pslCDnaFilter\ with a global near-best criteria of 0.5% in finished genomes\ (human and mouse) and 1.0% in other genomes. Alignments\ where less than 20% of the transcript mapped were discarded.\
\

\ \

\ To ensure unique identifiers for each alignment, cDNA and gene accessions were\ made unique by appending a suffix for each location in the source genome and\ again for each mapped location in the destination genome. The format is:\

\
   accession.version-srcUniq.destUniq\
\ \ Where srcUniq is a number added to make each source alignment unique, and\ destUniq is added to give the subsequent TransMap alignments unique\ identifiers.\

\

\ For example, in the cow genome, there are two alignments of mRNA BC149621.1.\ These are assigned the identifiers BC149621.1-1 and BC149621.1-2.\ When these are mapped to the human genome, BC149621.1-1 maps to a single\ location and is given the identifier BC149621.1-1.1. However, BC149621.1-2\ maps to two locations, resulting in BC149621.1-2.1 and BC149621.1-2.2. Note\ that multiple TransMap mappings are usually the result of tandem duplications, where both\ chains are identified as syntenic.\

\ \

Data Access

\ \

\ The raw data for these tracks can be accessed interactively through the\ Table Browser or the\ Data Integrator.\ For automated analysis, the annotations are stored in\ bigPsl files (containing a\ number of extra columns) and can be downloaded from our\ download server, \ or queried using our API. For more \ information on accessing track data see our \ Track Data Access FAQ.\ The files are associated with these tracks in the following way:\

\ Individual regions or the whole genome annotation can be obtained using our tool\ bigBedToBed, which can be compiled from the source code or downloaded as\ a precompiled binary for your system. Instructions for downloading source code and\ binaries can be found\ here.\ The tool can also be used to obtain only features within a given range, for example:\

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/aptMan1/transMap/V5/aptMan1.refseq.transMapV5.bigPsl\ -chrom=chr6 -start=0 -end=1000000 stdout\ \ \

Credits

\ \

\ This track was produced by Mark Diekhans at UCSC from cDNA and EST sequence data\ submitted to the international public sequence databases by \ scientists worldwide and annotations produced by the RefSeq,\ Ensembl, and GENCODE annotations projects.

\ \

References

\

\ Siepel A, Diekhans M, Brejová B, Langton L, Stevens M, Comstock CL, Davis C, Ewing B, Oommen S,\ Lau C et al.\ \ Targeted discovery of novel human exons by comparative genomics.\ Genome Res. 2007 Dec;17(12):1763-73.\ PMID: 17989246; PMC: PMC2099585\

\ \

\ Stanke M, Diekhans M, Baertsch R, Haussler D.\ \ Using native and syntenically mapped cDNA alignments to improve de novo gene finding.\ Bioinformatics. 2008 Mar 1;24(5):637-44.\ PMID: 18218656\

\ \

\ Zhu J, Sanborn JZ, Diekhans M, Lowe CB, Pringle TH, Haussler D.\ \ Comparative genomics search for losses of long-established genes on the human lineage.\ PLoS Comput Biol. 2007 Dec;3(12):e247.\ PMID: 18085818; PMC: PMC2134963\

\ \ genes 0 group genes\ html transMapV5\ longLabel TransMap Alignments Version 5\ shortLabel TransMap V5\ superTrack on\ track transMapV5\ uniprot UniProt bigBed 12 + UniProt SwissProt/TrEMBL Protein Annotations 0 100 0 0 0 127 127 127 0 0 0

Description

\ \

\ This track shows protein sequences and annotations on them from the UniProt/SwissProt database,\ mapped to genomic coordinates. \

\

\ UniProt/SwissProt data has been curated from scientific publications by the UniProt staff,\ UniProt/TrEMBL data has been predicted by various computational algorithms.\ The annotations are divided into multiple subtracks, based on their "feature type" in UniProt.\ The first two subtracks below - one for SwissProt, one for TrEMBL - show the\ alignments of protein sequences to the genome, all other tracks below are the protein annotations\ mapped through these alignments to the genome.\

\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
Track NameDescription
UCSC Alignment, SwissProt = curated protein sequencesProtein sequences from SwissProt mapped to the genome. All other\ tracks are (start,end) SwissProt annotations on these sequences mapped\ through this alignment. Even protein sequences without a single curated \ annotation (splice isoforms) are visible in this track. Each UniProt protein \ has one main isoform, which is colored in dark. Alternative isoforms are \ sequences that do not have annotations on them and are colored in light-blue. \ They can be hidden with the TrEMBL/Isoform filter (see below).
UCSC Alignment, TrEMBL = predicted protein sequencesProtein sequences from TrEMBL mapped to the genome. All other tracks\ below are (start,end) TrEMBL annotations mapped to the genome using\ this track. This track is hidden by default. To show it, click its\ checkbox on the track configuration page.
UniProt Signal PeptidesRegions found in proteins destined to be secreted, generally cleaved from mature protein.
UniProt Extracellular DomainsProtein domains with the comment "Extracellular".
UniProt Transmembrane DomainsProtein domains of the type "Transmembrane".
UniProt Cytoplasmic DomainsProtein domains with the comment "Cytoplasmic".
UniProt Polypeptide ChainsPolypeptide chain in mature protein after post-processing.
UniProt Regions of InterestRegions that have been experimentally defined, such as the role of a region in mediating protein-protein interactions or some other biological process.
UniProt DomainsProtein domains, zinc finger regions and topological domains.
UniProt Disulfide BondsDisulfide bonds.
UniProt Amino Acid ModificationsGlycosylation sites, modified residues and lipid moiety-binding regions.
UniProt Amino Acid MutationsMutagenesis sites and sequence variants.
UniProt Protein Primary/Secondary Structure AnnotationsBeta strands, helices, coiled-coil regions and turns.
UniProt Sequence ConflictsDifferences between Genbank sequences and the UniProt sequence.
UniProt RepeatsRegions of repeated sequence motifs or repeated domains.
UniProt Other AnnotationsAll other annotations, e.g. compositional bias
\

\ For consistency and convenience for users of mutation-related tracks,\ the subtrack "UniProt/SwissProt Variants" is a copy of the track\ "UniProt Variants" in the track group "Phenotype and Literature", or \ "Variation and Repeats", depending on the assembly.\

\ \

Display Conventions and Configuration

\ \

\ Genomic locations of UniProt/SwissProt annotations are labeled with a short name for\ the type of annotation (e.g. "glyco", "disulf bond", "Signal peptide"\ etc.). A click on them shows the full annotation and provides a link to the UniProt/SwissProt\ record for more details. TrEMBL annotations are always shown in \ light blue, except in the Signal Peptides,\ Extracellular Domains, Transmembrane Domains, and Cytoplamsic domains subtracks.

\ \

\ Mouse over a feature to see the full UniProt annotation comment. For variants, the mouse over will\ show the full name of the UniProt disease acronym.\

\ \

\ The subtracks for domains related to subcellular location are sorted from outside to inside of \ the cell: Signal peptide, \ extracellular, \ transmembrane, and cytoplasmic.\

\ \

\ In the "UniProt Modifications" track, lipoification sites are highlighted in \ dark blue, glycosylation sites in \ dark green, and phosphorylation in \ light green.

\ \

\ Duplicate annotations are removed as far as possible: if a TrEMBL annotation\ has the same genome position and same feature type, comment, disease and\ mutated amino acids as a SwissProt annotation, it is not shown again. Two\ annotations mapped through different protein sequence alignments but with the same genome\ coordinates are only shown once.

\ \

On the configuration page of this track, you can choose to hide any TrEMBL annotations.\ This filter will also hide the UniProt alternative isoform protein sequences because\ both types of information are less relevant to most users. Please contact us if you\ want more detailed filtering features.

\ \

Note that for the human hg38 assembly and SwissProt annotations, there\ also is a public\ track hub prepared by UniProt itself, with \ genome annotations maintained by UniProt using their own mapping\ method based on those Gencode/Ensembl gene models that are annotated in UniProt\ for a given protein. For proteins that differ from the genome, UniProt's mapping method\ will, in most cases, map a protein and its annotations to an unexpected location\ (see below for details on UCSC's mapping method).

\ \

Methods

\ \

\ Briefly, UniProt protein sequences were aligned to the transcripts associated\ with the protein, the top-scoring alignments were retained, and the result was\ projected to the genome through a transcript-to-genome alignment.\ Depending on the genome, the transcript-genome alignments was either\ provided by the source database (NBCI RefSeq), created at UCSC (UCSC RefSeq) or\ derived from the transcripts (Ensembl/Augustus). The transcript set is NCBI\ RefSeq for hg38, UCSC RefSeq for hg19 (due to alt/fix haplotype misplacements \ in the NCBI RefSeq set on hg19). For other genomes, RefSeq, Ensembl and Augustus \ are tried, in this order. The resulting protein-genome alignments of this process \ are available in the file formats for liftOver or pslMap from our data archive\ (see "Data Access" section below).\

\ \

An important step of the mapping process protein -> transcript ->\ genome is filtering the alignment from protein to transcript. Due to\ differences between the UniProt proteins and the transcripts (proteins were\ made many years before the transcripts were made, and human genomes have\ variants), the transcript with the highest BLAST score when aligning the\ protein to all transcripts is not always the correct transcript for a protein\ sequence. Therefore, the protein sequence is aligned to only a very short list\ of one or sometimes more transcripts, selected by a three-step procedure:\

    \
  1. Use transcripts directly annotated by UniProt: for organisms that have a RefSeq transcript track,\ proteins are aligned to the RefSeq transcripts that are annotated\ by UniProt for this particular protein.\
  2. Use transcripts for NCBI Gene ID annotated by UniProt: If no transcripts are annotated on the\ protein, or the annotated ones have been deprecated by NCBI, but a NCBI Gene ID is\ annotated, the RefSeq transcripts for this Gene ID are used. This can result in multiple matching transcripts for a protein.\
  3. Use best matching transcript: If no NCBI Gene is\ annotated, then BLAST scores are used to pick the transcripts. There can be multiple transcripts for one\ protein, as their coding sequences can be identical. All transcripts within 1% of the highest observed BLAST score are used.\
\

\ \

\ For strategy 2 and 3, many of the transcripts found do not differ in coding\ sequence, so the resulting alignments on the genome will be identical.\ Therefore, any identical alignments are removed in a final filtering step. The\ details page of these alignments will contain a list of all transcripts that\ result in the same protein-genome alignment. On hg38, only a handful of edge\ cases (pseudogenes, very recently added proteins) remain in 2023 where strategy\ 3 has to be used.

\ \

In other words, when an NCBI or UCSC RefSeq track is used for the mapping and to align a\ protein sequence to the correct transcript, we use a three stage process:\

    \
  1. If UniProt has annotated a given RefSeq transcript for a given protein\ sequence, the protein is aligned to this transcript. Any difference in the\ version suffix is tolerated in this comparison. \
  2. If no transcript is annotated or the transcript cannot be found in the\ NCBI/UCSC RefSeq track, the UniProt-annotated NCBI Gene ID is resolved to a\ set of NCBI RefSeq transcript IDs via the most current version of NCBI\ genes tables. Only the top match of the resulting alignments and all\ others within 1% of its score are used for the mapping.\
  3. If no transcript can be found after step (2), the protein is aligned to all transcripts,\ the top match, and all others within 1% of its score are used.\
\ \

This system was designed to resolve the problem of incorrect mappings of\ proteins, mostly on hg38, due to differences between the SwissProt\ sequences and the genome reference sequence, which has changed since the\ proteins were defined. The problem is most pronounced for gene families\ composed of either very repetitive or very similar proteins. To make sure that\ the alignments always go to the best chromosome location, all _alt and _fix\ reference patch sequences are ignored for the alignment, so the patches are\ entirely free of UniProt annotations. Please contact us if you have feedback on\ this process or example edge cases. We are not aware of a way to evaluate the\ results completely and in an automated manner.

\

\ Proteins were aligned to transcripts with TBLASTN, converted to PSL, filtered\ with pslReps (93% query coverage, keep alignments within top 1% score), lifted to genome\ positions with pslMap and filtered again with pslReps. UniProt annotations were\ obtained from the UniProt XML file. The UniProt annotations were then mapped to the\ genome through the alignment described above using the pslMap program. This approach\ draws heavily on the LS-SNP pipeline by Mark Diekhans.\ Like all Genome Browser source code, the main script used to build this track\ can be found on Github.\

\ \

Older releases

\

\ This track is automatically updated on an ongoing basis, every 2-3 months.\ The current version name is always shown on the track details page, it includes the\ release of UniProt, the version of the transcript set and a unique MD5 that is\ based on the protein sequences, the transcript sequences, the mapping file\ between both and the transcript-genome alignment. The exact transcript\ that was used for the alignment is shown when clicking a protein alignment\ in one of the two alignment tracks.\

\ \

\ For reproducibility of older analysis results and for manual inspection, previous versions of this track\ are available for browsing in the form of the UCSC UniProt Archive Track Hub (click this link to connect the hub now). The underlying data of\ all releases of this track (past and current) can be obtained from our downloads server, including the UniProt\ protein-to-genome alignment.

\ \

Data Access

\ \

\ The raw data of the current track can be explored interactively with the\ Table Browser, or the\ Data Integrator.\ For automated analysis, the genome annotation is stored in a bigBed file that \ can be downloaded from the\ download server.\ The exact filenames can be found in the \ track configuration file. \ Annotations can be converted to ASCII text by our tool bigBedToBed\ which can be compiled from the source code or downloaded as a precompiled\ binary for your system. Instructions for downloading source code and binaries can be found\ here.\ The tool can also be used to obtain only features within a given range, for example:\

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/aptMan1/uniprot/unipStruct.bb -chrom=chr6 -start=0 -end=1000000 stdout \

\ Please refer to our\ mailing list archives\ for questions, or our\ Data Access FAQ\ for more information. \

\ \

\ \

Lifting from UniProt to genome coordinates in pipelines

\

To facilitate mapping protein coordinates to the genome, we provide the\ alignment files in formats that are suitable for our command line tools. Our\ command line programs liftOver or pslMap can be used to map\ coordinates on protein sequences to genome coordinates. The filenames are\ unipToGenome.over.chain.gz (liftOver) and unipToGenomeLift.psl.gz (pslMap).

\ \

Example commands:\

\
wget -q https://hgdownload.soe.ucsc.edu/goldenPath/archive/hg38/uniprot/2022_03/unipToGenome.over.chain.gz\
wget -q https://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/liftOver\
chmod a+x liftOver\
echo 'Q99697 1 10 annotationOnProtein' > prot.bed\
liftOver prot.bed unipToGenome.over.chain.gz genome.bed\
cat genome.bed\
\

\ \

Credits

\ \

\ This track was created by Maximilian Haeussler at UCSC, with a lot of input from Chris\ Lee, Mark Diekhans and Brian Raney, feedback from the UniProt staff, Alejo\ Mujica, Regeneron Pharmaceuticals and Pia Riestra, GeneDx. Thanks to UniProt for making all data\ available for download.\

\ \

References

\ \

\ UniProt Consortium.\ \ Reorganizing the protein space at the Universal Protein Resource (UniProt).\ Nucleic Acids Res. 2012 Jan;40(Database issue):D71-5.\ PMID: 22102590; PMC: PMC3245120\

\ \

\ Yip YL, Scheib H, Diemand AV, Gattiker A, Famiglietti LM, Gasteiger E, Bairoch A.\ \ The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure\ information on human protein variants.\ Hum Mutat. 2004 May;23(5):464-70.\ PMID: 15108278\

\ genes 1 allButtonPair on\ compositeTrack on\ dataVersion /gbdb/$D/uniprot/version.txt\ exonNumbers off\ group genes\ hideEmptySubtracks on\ itemRgb on\ longLabel UniProt SwissProt/TrEMBL Protein Annotations\ mouseOverField comments\ shortLabel UniProt\ track uniprot\ type bigBed 12 +\ urls uniProtId="http://www.uniprot.org/uniprot/$$#section_features" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$"\ visibility hide\ windowmaskerSdust WM + SDust bed 3 Genomic Intervals Masked by WindowMasker + SDust 0 100 0 0 0 127 127 127 0 0 0

Description

\ \

\ This track depicts masked sequence as determined by\ WindowMasker. The\ WindowMasker tool is included in the NCBI C++ toolkit. The source code\ for the entire toolkit is available from the NCBI\ \ FTP site.\

\ \

Methods

\ \

\ To create this track, WindowMasker was run with the following parameters:\

\
windowmasker -mk_counts true -input aptMan1.fa -output wm_counts\
windowmasker -ustat wm_counts -sdust true -input aptMan1.fa -output repeats.bed\
\ The repeats.bed (BED3) file was loaded into the "windowmaskerSdust" table for\ this track.\

\ \

References

\ \

\ Morgulis A, Gertz EM, Schäffer AA, Agarwala R.\ WindowMasker: window-based masker for sequenced genomes.\ Bioinformatics. 2006 Jan 15;22(2):134-41.\ PMID: 16287941\

\ varRep 1 group varRep\ longLabel Genomic Intervals Masked by WindowMasker + SDust\ shortLabel WM + SDust\ track windowmaskerSdust\ type bed 3\ visibility hide\