cytoBand Chromosome Band bed 4 + Chromosome Bands Based On ISCN Lengths 1 1 0 0 0 150 50 50 0 0 0
The chromosome band track represents the approximate \ location of bands seen on Giemsa-stained chromosomes\ under conditions where 400 bands are visible across the entire\ genome.
\Data are derived from the Mouse400.dat file downloaded from the NCBI ftp \ site ftp://ftp.ncbi.nih.gov/genomes/M_musculus/maps/mapview/. Band lengths\ are estimated based on relative sizes as defined by the International System for \ Cytogenetic Nomenclature (ISCN).\ \
We would like to thank NCBI for providing this information.\ map 1 mapGenethon STS Markers bed 5 + Various STS Markers 0 2 0 0 0 127 127 127 0 0 0 map 1 stsMapMouse STS Markers bed 5 + STS Markers on Genetic Maps 1 5 0 0 0 128 128 255 0 0 0
This track shows locations of Sequence-Tagged Site (STS) markers along \ the mouse draft assembly. These markers appear on the Mouse Genome Informatics (MGI) consensus mouse genetic \ map. Information about the genetic map and STS marker primer sequences are \ provided by the Mouse Genome Informatics database group at The Jackson \ Laboratory.
\ map 1 gold Assembly bed 3 + Assembly from Fragments 0 10 150 100 30 230 170 40 0 0 0This track shows the draft assembly of the $organism genome.\ This assembly merges contigs from overlapping drafts and\ finished clones into longer sequence contigs. The sequence\ contigs are ordered and oriented when possible by mRNA, EST,\ paired plasmid reads (from the SNP Consortium) and BAC end\ sequence pairs.
\In dense mode, this track depicts the path through the draft and \ finished clones (aka the golden path) used to create the assembled sequence. \ Clone boundaries are distinguished by the use of alternating gold and brown \ coloration. Where gaps\ exist in the path, spaces are shown between the gold and brown\ blocks. If the relative order and orientation of the contigs\ between the two blocks is known, a line is drawn to bridge the\ blocks.
\\ Clone Type Key:\
\ Gaps are represented as black boxes in this track.\ If the relative order and orientation of the contigs on either side\ of the gap is known, it is a bridged gap and a white line is drawn \ through the black box representing the gap. \
\There are four principal types of gaps:\
Bacterial artificial chromosomes (BACs) are a key part of many large\ scale sequencing projects. A BAC typically consists of 50-300kb of\ DNA. During the early phase of a sequencing project, it is common\ to sequence a single read (approximately 500 bases) off each end of\ a large number of BACs. Later on in the project, these BAC end reads\ can be mapped to the genome sequence. \
\This track shows these mappings\ in cases where both ends could be mapped. These BAC end pairs can\ be useful for validating the assembly over relatively long ranges. In some\ cases, the BACs are useful biological reagents. This track can also be\ used for determining which BAC contains a given gene, useful information\ for certain wet lab experiments.\ \
A valid pair of BAC end sequences must be\ at least 50Kb but no more than 600Kb away from each other. \ The orientation of the first BAC end sequence must be "+" and\ the orientation of the second BAC end sequence must be "-".
\ \BAC end sequences are placed on the assembled sequence using\ Jim Kent's \ blat \ program.
\ \Additional information about the clone, including how it\ can be obtained, may be found at the \ NCBI Clone Registry.\ To view the registry entry for a specific clone, open the details page for the clone and click on its name at the top of the page.\
\ map 1 jaxQTL QTL bed 6 + Quantitative Trait Loci From Jax/MGI 0 21.1 200 100 0 227 177 127 0 0 0 http://www.informatics.jax.org/searches/accession_report.cgi?id=$$\ Approximate positions of quantititive\ trait loci based on reported peak LOD scores taken\ from\ Jackson Lab's\ \ Mouse Genome Informatics Database (MGI).\
\\ Thanks to Carol Bult for providing the QTL data. \
\ map 1 gcPercent GC Percent bed 4 + Percentage GC in 20,000 Base Windows 0 23 0 0 0 127 127 127 1 0 0\ The GC percent track shows the percentage of G (guanine) and C (cytosine) bases\ in a 20,000 base window. Windows with high GC content are drawn more darkly \ than windows with low GC content. High GC content is typically associated with \ gene-rich areas.\
\\ This track was generated at UCSC.\ map 1 knownGene Known Genes genePred refPep refMrna Known Genes Based on SWISS-PROT, TrEMBL, mRNA, and RefSeq 3 34 12 12 120 133 133 187 0 0 0
\ The Known Genes track shows known protein coding genes based on \ proteins from SWISS-PROT, TrEMBL, and TrEMBL-NEW and their\ corresponding mRNAs from GenBank.\ Coding exons are displayed \ taller than 5' and 3' untranslated regions (UTR). Connecting introns \ are one-pixel lines with hatch marks indicating direction of transcription.\ Entries which have corresponding entries in PDB are colored black.\ Entries which either have corresponding proteins in SWISS-PROT or mRNAs that are \ NCBI Reference Sequences with a "Reviewed" status are colored dark blue.\ Entries which have mRNAs that are \ NCBI Reference Sequences with a "Provisional" status are colored lighter blue.\ Everything else is colored with lightest blue.
\ \\ All mRNAs of a species are aligned against the genome using the blat\ program. When a single mRNA aligns in multiple places, only\ the best alignments are kept. The alignments must also have \ at least 98% sequence identity to be kept. \ This set of mRNA alignments is further reduced by keeping only those mRNAs that \ are referenced by a protein in SWISS-PROT, TrEMBL, or TrEMBL-NEW.
\\ Among multiple mRNAs referenced by a single protein, the best mRNA is chosen based on \ a quality score, which depends on its length, how good its translation matches \ the protein sequence, and its release date.\ The list of mRNA and protein pairs are further cleaned up by removing \ short invalid entries and consolidating entries with identical CDS regions.
\\ Finally, RefSeq entries which are derived from DNA sequences instead of \ mRNA sequences are added. Disease annotations are from SWISS-PROT.
\ \\ The Known Genes track is produced at UCSC based primarily on cross-references \ between proteins from \ SWISS-PROT \ (also including TrEMBL and TrEMBL-NEW) and mRNAs from GenBank\ generated by scientists worldwide. Part of \ NCBI RefSeq \ data are also included in this track.
\ \\ The SWISS-PROT entries in this annotation track are copyrighted. They are \ produced through a collaboration \ between the Swiss Institute of Bioinformatics and the EMBL Outstation - the \ European Bioinformatics Institute. There are no restrictions on their use by \ non-profit institutions as long as their content is in no way modified and this \ statement is not removed. Usage by and for commercial entities requires a \ license agreement (see \ http://www.isb-sib.ch/announce/ or send an email to \ license@isb-sib.ch).
\ \ genes 1 hgGene on\ refGene RefSeq Genes genePred refPep refMrna RefSeq Genes 0 35 12 12 120 133 133 187 0 0 0\ The RefSeq Genes track shows known protein coding genes taken from mRNA \ reference sequences compiled at LocusLink. Coding exons are represented by \ blocks connected by horizontal lines representing introns. The 5' and 3' \ untranslated regions (UTRs) are displayed as shorter blocks on the leading \ and trailing ends of the aligning regions. In full display mode, arrowheads \ on the connecting intron lines indicate the direction of transcription.\
\\ Non-coding RNA genes have their own track in some assemblies.\
\\ Refseq mRNAs are aligned against the genome using the blat\ program. When a single mRNA aligns in multiple places, only\ the best alignments which also have at least 98% sequence identity are kept.\
\The track filter can be used to configure the labeling of the features within\ the track. By default, items are labeled by gene name. Click the \ appropriate Label option to display the accession name instead of the gene\ name, show both the gene and accession names, or turn off the label completely.\ After you have made your selection, click Submit to return to the tracks display\ page.\
\ The RefSeq Genes track is produced at UCSC from mRNA sequence data\ generated by scientists worldwide and curated by the \ NCBI RefSeq project. \
\ genes 1 genieAlt AltGenie genePred genieAltPep Genie Gene Predictions from Affymetrix 1 39 125 0 150 190 127 202 0 0 0Genie predictions are based on \ Affymetrix's \ Genie gene finding software. Genie is a generalized HMM \ which accepts constraints based on mRNA and EST data.
\ genes 1 ensGene Ensembl Genes genePred ensPep Ensembl Gene Predictions 1 40 150 0 0 202 127 127 0 0 0 http://www.ensembl.org/Mus_musculus/transview?transcript=$$For a description of the methods used in Ensembl gene prediction, refer to \ \ The Ensembl genome database project, Nucleic Acids Research, \ 2002, 30(1) 38-41.
\ \\ The Twinscan program predicts genes in a manner similar to Genscan, except that\ Twinscan takes advantage of genome comparison to improve gene prediction\ accuracy. More information and a web server can be found at\ \ http://genes.cs.wustl.edu/.\
\\ The Twinscan algorithm is described in Korf I, Flicek P, Duan D, and Brent MR \ (2001), "Integrating genomic homology into gene structure prediction", \ Bioinformatics 17:S140-148.\
\\ Thanks to Michael Brent's Computational Genomics Group at Washington University St. Louis for providing these data.\ genes 1 slamHuman Slam Human genePred Slam Gene Predictions Using Human/Mouse Homology 0 45.5 100 50 0 175 150 128 0 0 0
\ Slam \ predicts coding exons and conserved noncoding regions in a pair of homologous \ DNA sequences, incorporating both statistical sequence properties and degree of \ conservation in making the predictions. This particular annotation uses the Nov. 2002 (hg13) assembly of the human genome. The model is symmetric and the same gene structure (with possibly different exon lengths) is predicted in both sequences. \
\\ The symmetry of the model gives it a higher degree of accuracy for regions where the true underlying gene structures contain the same number of coding exons, in cases where this is not true, or when one of the sequences is of lower quality and contains in-frame stop codons, the resulting predictions tend to have lower accuracy.\
\\ More information on the accuracy of the predictions can be found at http://bio.math.berkeley.edu/slam/mouse. A web server for individual requests is available at http://bio.math.berkeley.edu/slam.\
\\
\
M. Alexandersson, S. Cawley, L. Pachter (2003). SLAM - Cross-species Gene Finding and Alignment with a Generalized Pair Hidden Markov Model. Genome Research 13(3):496-502.\
L. Pachter, M. Alexandersson, S. Cawley (2001). \
Applications of Generalized Pair Hidden Markov Models to Alignment and Gene Finding Problems, \
Proceedings of the Fifth Annual International Conference on Computational Molecular Biology (RECOMB 2001).\
L. Pachter , M. Alexandersson, S. Cawley (2002). \
Applications of Generalized Pair Hidden Markov Models to Alignment and Gene Finding Problems, \
Journal of Computational Biology 9(2):389-400.
\ This track shows gene predictions from the SGP program, which is being developed at \ the Grup de Recerca en\ Informàtica Biomèdica (GRIB) at Institut Municipal d'Investigació Mèdica (IMIM) in \ Barcelona. To predict genes in a genomic\ query, SGP combines geneid predictions with tblastx comparisons of the genomic query against other genomic sequences.\
\\ Thanks to GRIB for providing these gene predictions.\
\ \ \ \ genes 1 softberryGene Fgenesh++ Genes genePred softberryPep Fgenesh++ Gene Predictions 0 48 0 100 0 127 177 127 0 0 0Fgenesh++ predictions are based on Softberry's gene finding software.
\ \The Fgenesh++ gene predictions were produced by \ Softberry Inc. \ Commercial use of these predictions is restricted to viewing in \ this browser. Please contact Softberry Inc. to make arrangements for further commercial access.\ \ genes 1 geneid Geneid Genes genePred geneidPep Geneid Gene Predictions 0 49 0 90 100 127 172 177 0 0 0
\ This track shows gene predictions from the geneid program developed at the \ Grup de Recerca en\ Informàtica Biomèdica (GRIB) at Institut Municipal d'Investigació Mèdica (IMIM) in \ Barcelona. \
\\ Geneid is a program to predict genes in anonymous genomic sequences designed \ with a hierarchical structure. In the first step, splice sites, start and stop \ codons are predicted and scored along the sequence using Position Weight Arrays \ (PWAs). Next, exons are built from the sites. Exons are scored as the sum of the \ scores of the defining sites, plus the the log-likelihood ratio of a \ Markov Model for coding DNA. Finally, from the set of predicted exons, the gene \ structure is assembled, maximizing the sum of the scores of the assembled exons. \
\\ Thanks to GRIB for providing these data.\
\ genes 1 genscan Genscan Genes genePred genscanPep Genscan Gene Predictions 1 50 170 100 0 212 177 127 0 0 0This track shows predictions from the \ Genscan program written by Chris Burge.\
\\ This track shows the location of non-protein coding RNA genes and\ pseudogenes. \
\ Feature types include:\
\ NOTE: The RNA Genes annotations appear only on chromosome 7.\ \
\
Eddy-tRNAscanSE (tRNA genes, Sean Eddy):
\
tRNAscan-SE 1.23 with default parameters.\
Score field contains tRNAscan-SE bit score; >20 is good, >50 is great.
\
Eddy-BLAST-tRNAlib (tRNA pseudogenes, Sean Eddy):
\
Wublast 2.0, with options "-kap wordmask=seg B=50000 W=8 cpus=1".\
Score field contains % identity in blast-aligned region.\
Used each of 602 tRNAs and pseudogenes predicted by tRNAscan-SE\
in the human oo27 assembly as queries. Kept all nonoverlapping\
regions that hit one or more of these with P <= 0.001.
\
Eddy-BLAST-snornalib (known snoRNAs and snoRNA pseudogenes, Steve Johnson):
\
Wublastn 2.0, with options "-V=25 -hspmax=5000 -kap wordmask=seg \
B=5000 W=8 cpus=1".\
Score field contains blast score.\
Used each of 104 unique snoRNAs in snorna.lib as a query.\
Any hit >=95% full length and >=90% identity is annotated as a\
"true gene".\
Any other hit with P <= 0.001 is annotated as a "related sequence" \
and interpreted as a putative pseudogene.
\
Eddy-BLAST-otherrnalib \
(non-tRNA, non-snoRNA noncoding RNAs with GenBank entries\
for the human gene.):
\
Wublastn 2.0 [15 Apr 2002]\
with options: "-kap -cpus=1 -wordmask=seg -W=8 -E=0.01 -hspmax=0\
-B=50000 -Z=3000000000". Exceptions to this are:\
\ The score field contains the blastn score. \ Used 41 unique miRNAs, and 29 other ncRNAs as queries.\ Any hit >=95% full length and >=95% identity is annotated as a \ "true gene".\ Any other hit with P <= 0.001 and >= 65% identity is annotated\ as a "related sequence". An exception to this is: all miRNAs consist \ \ of 16-26 bp sequences in GenBank \ and are only annotated if 100% full length and 100% identity. \ miRNAs consist of Let-7 from Pasquinelli et al., \ Nature (2000) 408:86; 40 from Mourelatos et al., Gene & Dev (2002) \ 16:720.
\\ These data were kindly provided by Sean Eddy at Washington University.
\ genes 1 superfamily Superfamily bed 4 + Superfamily/SCOP: Proteins Having Homologs with Known Structure/Function 0 53 150 0 0 202 127 127 0 0 0 http://supfam.org/SUPERFAMILY/cgi-bin/gene.cgi?seqid=$$\ The \ Superfamily \ track shows proteins having homologs with known structures or functions.
\\ Each entry on the track shows the coding region of a gene (based on Ensembl gene predictions).\ In full display mode, the label for an entry consists of the names of \ all known protein domains coded by this gene. This \ usually contains structural and/or function descriptions that provide valuable information to help users get a quick grasp of the biological significance of the gene.
\\ Data are downloaded from the Superfamily server.\ Using the cross-reference between Superfamily entries and Ensembl gene prediction entries\ and their alignment to the appropriate genome, the associated data are processed to generate \ a simple BED format track.
\\ Superfamily is developed by\ Julian\ Gough at the MRC Laboratory\ of Molecular Biology, Cambridge.
\\ Gough, J., Karplus, K., Hughey, R. and\ Chothia, C. (2001). "Assignment of Homology to Genome Sequences using a\ Library of Hidden Markov Models that Represent all Proteins of Known Structure". \ J. Mol. Biol., 313(4), 903-919.
\ \ genes 1 mrna $Organism mRNAs psl . $Organism mRNAs from GenBank 3 54 0 0 0 127 127 127 1 0 0\ The $Organism mRNA track shows alignments between $organism mRNAs\ in GenBank and the genome. Aligning regions (usually exons)\ are shown as black boxes connected by lines for gaps (spliced\ out introns, usually). In full display, arrows on the introns\ indicate the direction of transcription.
\\ GenBank $organism mRNAs are aligned against the genome using the \ blat\ program. When a single mRNA aligns in multiple places, \ the alignment having the highest base identity is found. \ Only alignments that have a base identity level within 1% of\ the best are kept. Alignments must also have at least 95%\ base identity to be kept.
\ \The track filter can be used to change the color or include/exclude a subset of individual \ items within a track. This is helpful when many items are shown in the track\ display, especially when only some are relevant to the current task. To use the\ filter:\
\ When you have finished configuring the filter, click the Submit button.
\ \\ The $Organism mRNA track is produced at UCSC from mRNA sequence data\ submitted to the international public sequence databases by \ scientists worldwide.
\ rna 1 intronEst Spliced ESTs psl est $Organism ESTs That Have Been Spliced 1 56 0 0 0 127 127 127 1 0 0The Spliced EST track displays Expressed Sequence Tags \ (ESTs) from GenBank that show signs of splicing when\ aligned against the genome. By requiring splicing, the level \ of contamination in the EST databases is drastically reduced\ at the expense of eliminating many genuine 3' ESTs.\ For a display of all ESTs (including unspliced), see the \ $Organism EST track.
\ \Expressed sequence tags are single-read (typically\ approximately 500 base) sequences which usually\ represent fragments of transcribed genes. Aligning \ regions (usually exons) are shown as black boxes \ connected by lines for gaps (usually spliced out introns). \ In full display mode, arrows on the introns\ indicate the direction of transcription. In the\ December 2001 assembly and later, this direction is\ taken by looking at the splice sites. In previous\ assemblies, the direction of transcription was taken from \ the GenBank annotations, which frequently were inaccurate.
\ \Strand information provided for ESTs (+/-) indicates the\ direction of the match between the EST and the matching\ genomic sequence. It bears no relationship to the direction\ of transcription of the RNA with which it might be associated.\ \
To make an EST, RNA is isolated from cells and reverse\ transcribed into cDNA. Typically, the cDNA is cloned\ into a plasmid vector, and a read taken from the 5'\ and/or 3' primer. For most - but not all - ESTs, the\ reverse transcription is primed by an oligo-dT, which\ hybridizes with the poly-A tail of mature mRNA. The\ reverse transcriptase may or may not make it to the 5'\ end of the mRNA, which may or may not be degraded.
\ \In general, the 3' ESTs mark the end of transcription\ reasonably well, but the 5' ESTs may end at any point\ within the transcript. Some of the newer cap-selected\ libraries are starting to hit transcription start\ reasonably well. Before the cap-selection techniques\ emerged, some projects used random rather than poly-A\ priming in an attempt to get sequence distant from the\ 3' end. These projects were successful at this, but as\ a side effect also deposited sequences from unprocessed\ mRNA and perhaps even genomic sequences into the EST databases.\ (Even outside of the random-primed projects, there is a\ degree of non-mRNA contamination.) Because of this, a\ single unspliced EST should be viewed with considerable\ skepticism. However, because the $organism 3' UTRs are quite\ long, the splicing requirement does eliminate many genuine 3'\ ESTs.
\ \To generate this track, $organism ESTs from GenBank are aligned \ against the genome using the \ blat \ program. Note that the maximum intron length\ allowed by blat is 500,000 bases, which may eliminate some ESTs with very \ long introns that might otherwise align. When a single \ EST aligns in multiple places, the alignment having the \ highest base identity is found. Only alignments that have \ a base identity level within 1% of the best are kept. \ Alignments must also have at least 93% base identity to be kept.
\ \The track filter can be used to change the color or include/exclude a subset of \ individual items within a track. This is helpful when many items are shown in the \ track display, especially when only some are relevant to the current task. To use the\ filter:\
\ When you have finished configuring the filter, click the Submit button.
Credits\\ The Spliced EST track is produced at UCSC from EST sequence data\ submitted to the international public sequence databases by \ scientists worldwide.
\ rna 1 est $Organism ESTs psl est $Organism ESTs Including Unspliced 0 57 0 0 0 127 127 127 1 0 0\ This track shows alignments between $organism Expressed\ Sequence Tags (ESTs) in GenBank and the genome.
\ \Expressed sequence tags are single-read (typically\ approximately 500 base) sequences which usually\ represent fragments of transcribed genes. Aligning \ regions (usually exons) are shown as black boxes \ connected by lines for gaps (usually spliced out introns). \ In full display mode, arrows on the introns\ indicate the direction of transcription. In the\ December 2001 assembly and later, this direction is\ taken by looking at the splice sites. In previous\ assemblies, the direction of transcription was taken from \ the GenBank annotations, which frequently were inaccurate.\
\ \Strand information provided for ESTs (+/-) indicates the\ direction of the match between the EST and the matching\ genomic sequence. It bears no relationship to the direction\ of transcription of the RNA with which it might be associated.\ \
To make an EST, RNA is isolated from cells and reverse\ transcribed into cDNA. Typically, the cDNA is cloned\ into a plasmid vector, and a read taken from the 5'\ and/or 3' primer. For most - but not all - ESTs, the\ reverse transcription is primed by an oligo-dT, which\ hybridizes with the poly-A tail of mature mRNA. The\ reverse transcriptase may or may not make it to the 5'\ end of the mRNA, which may or may not be degraded.
\ \In general, the 3' ESTs mark the end of transcription\ reasonably well, but the 5' ESTs may end at any point\ within the transcript. Some of the newer cap-selected\ libraries are starting to hit transcription start\ reasonably well. Before the cap-selection techniques\ emerged, some projects used random rather than poly-A\ priming in an attempt to get sequence distant from the\ 3' end. These projects were successful at this, but as\ a side effect also deposited sequences from unprocessed\ mRNA and perhaps even genomic sequences into the EST databases.\ (Even outside of the random-primed projects, there is a\ degree of non-mRNA contamination.) Because of this, a\ single unspliced EST should be viewed with considerable\ skepticism. However, because the $organism 3' UTRs are quite\ long, the splicing requirement does eliminate many genuine 3'\ ESTs.
\ \To generate this track, $organism ESTs from GenBank are aligned \ against the genome using the \ blat \ program. Note that the maximum intron length\ allowed by blat is 500,000 bases, which may eliminate some ESTs with very \ long introns that might otherwise align. When a single \ EST aligns in multiple places, the alignment having the \ highest base identity is found. Only alignments that have \ a base identity level within 1% of the best are kept. \ Alignments must also have at least 93% base identity to be kept.
\ \The track filter can be used to change the color or include/exclude a subset of \ individual items within a track. This is helpful when many items are shown in the \ track display, especially when only some are relevant to the current task. To use the\ filter:\
\ When you have finished configuring the filter, click the Submit button.
\ \\ The $Organism EST track is produced at UCSC from EST sequence data\ submitted to the international public sequence databases by \ scientists worldwide.
\ rna 1 xenoMrna Non$Organism mRNAs psl xeno Non$Organism mRNAs from GenBank 1 63 0 0 0 127 127 127 1 0 0\ This track displays translated \ blat\ alignments of\ non-$organism vertebrate and invertebrate mRNA from GenBank.
\ \The strand information (+/-) for this track is in two parts. The\ first + indicates the orientation of the query sequence whose\ translated protein produced the match (here always 5' to 3', hence +).\ The second + or - indicates the orientation of the matching \ translated genomic sequence (+ or -).\ \ \
\ The alignments were passed through a near-best-in-genome filter.
\ \The track filter can be used to color, include, or exclude a subset of individual \ items within a track. This is helpful when many items are shown in the track\ display, especially when only some are relevant to the current task. To use the\ filter:\
When you have finished configuring the filter, click the Submit button.
\ rna 1 tigrGeneIndex TIGR Gene Index genePred Alignment of TIGR Gene Index TCs Against the $Organism Genome 0 68 100 0 0 177 127 127 0 0 0 http://www.tigr.org/tigr-scripts/tgi/tc_report.pl?$$This track displays alignments of the TIGR Gene Index (TGI)\ against the $organism genome. The TIGR Gene Index is based\ largely on assemblies of EST sequences in the public databases.\ See \ www.tigr.org for more information about TIGR and the Gene Index.
\Thanks to Foo Cheung and Razvan Sultana of the The Institute for Genomic Research, for converting these data into a track for the browser.
\ rna 1 cpgIsland CpG Islands bed 4 + CpG Islands (Islands < 300 Bases are Light Green) 0 76 0 100 0 128 228 128 0 0 0\ CpG islands are associated with genes, particularly housekeeping\ genes, in vertebrates. CpG islands are typically common near\ transcription start sites, and may be associated with promoter\ regions. Normally a C (cytosine) base followed immediately by a G (guanine) base (a CpG) is rare in\ vertebrate DNA because the C's in such an arrangement tend to be\ methylated. This methylation helps distinguish the newly synthesized\ DNA strand from the parent strand, which aids in the final stages of\ DNA proofreading after duplication. However, over evolutionary time\ methylated C's tend to turn into T's because of spontaneous\ deamination. The result is that CpG's are relatively rare unless\ there is selective pressure to keep them or a region is not methylated\ for some reason, perhaps having to do with the regulation of gene\ expression. CpG islands are regions where CpG's are present at\ significantly higher levels than is typical for the genome as a whole.\
\ \\ CpG islands are predicted by searching the sequence one base at a\ time, scoring each dinucleotide (+17 for CG and -1 for others) and\ identifying maximally scoring segments. Each segment is then\ evaluated to determine GC content (>=50%), length (>200), and ratio of\ observed proportion of CG dinucleotides to the expected proportion on\ the basis of the GC content of the segment (>0.6). \
\ \\ This track was generated \ using a\ modification of a program developed by G. Miklem and L. Hillier. \
\ \ regulation 1 blatFish Tetraodon Blat psl xeno Tetraodon nigroviridis Translated Blat Alignments 1 112 0 60 120 200 220 255 1 0 0This track displays translated alignments of 728 million bases of Tetraodon nigroviridis \ whole genome shotgun reads vs. the draft $organism genome. Areas painted by\ this track are quite likely to be coding regions.
\The alignments were done \ with \ blat \ in translated protein mode requiring two nearby 4-mer matches\ to trigger a detailed alignment. The human\ genome was masked with \ \ RepeatMasker and Tandem Repeat Finder before \ running blat.
\Many thanks to Genoscope for \ providing the Tetraodon sequence.
\ compGeno 1 blatFugu Fugu Blat psl xeno Takifugu rubripes Translated Blat Alignments 1 113 0 60 120 200 220 255 1 0 0\ The Fugu v.3.0 (Aug. 2002) whole genome shotgun assembly was provided by the\ US \ DOE Joint Genome Institute (JGI). The assembly was constructed with the JGI\ assembler, JAZZ, from paired end sequencing reads produced at JGI, Myriad \ Genetics, and Celera Genomics, resulting in a sequence coverage of 5.7X. All reads are\ plasmid, cosmid, or BAC end sequences, with the predominant coverage\ derived from 2 Kb insert plasmids. This assembly contains 20,379\ scaffolds totaling 319 million base pairs. The largest 679 scaffolds\ total 160 million base pairs.\ \
The alignments were done \ with \ blat \ in translated protein mode requiring two nearby 4-mer matches\ to trigger a detailed alignment. The human\ genome was masked with \ RepeatMasker and Tandem Repeat Finder before \ running blat.
\ \The 3.0 draft from the\ \ JGI Fugu rubripes website was used in the\ UCSC Genome Browser Fugu blat alignments. These data have been provided freely by the JGI\ for use in this publication only.
\ compGeno 1 musHumL Human Cons sample 0 8 Mouse/Human Evolutionary Conservation Score 2 119 100 50 0 175 150 128 0 0 0\ This track displays the conservation between the mouse and human genomes for \ 50 bp windows in the mouse genome that have at least 15 bp aligned to\ human. The score for a window reflects the probability that the\ level of observed conservation in that 50 bp region would occur by\ chance under neutral evolution. It is given on a logarithmic scale,\ and thus it is called the "L-score". An L-score of 1 means there is a\ 1/10 probability that the observed conservation level would occur by\ chance, an L-score of 2 means a 1/100 probability, an L-score of 3\ means a 1/1000 probability, etc. The L-scores display as\ "mountain ranges". Clicking on a mountain range, a detail page is\ displayed from which you can access the base level alignments, both\ for the whole region and for the individual 50 bp windows.\ \
\ \\
Genome-wide alignments between mouse and human were produced by\
blastz. A set of 50 bp windows in the mouse genome were determined\
by scanning the sequence, sliding 5 bases at a time, and only those\
windows with at least 15 aligned bases were kept. For each window,\
a conservation score defined by\
\
\ L-score Frequentist probability Bayesian probability\ of this L-score or greater that window with this\ given neutral evolution L-score is under\ selection\ \ ------------------------------------------------------------------\ \ 1 0.1 0.32 \ 2 0.01 0.75\ 3 0.001 0.94\ 4 0.0001 0.97\ 5 0.00001 0.98\ 6 0.000001 0.99\ 7 0.0000001 >0.99 \ 8 0.00000001 >0.99\\
The track filter can be used to configure some of the display characteristics\ of the track. \
\ \ Thanks to Webb Miller and Scott Schwartz for creating the blastz\ alignments, Jim Kent for post-processing them, and \ Mark Diekhans for scoring the windows and selecting out the ancestral repeats. \ Krishna Roskin created S-scores for these windows. Ryan Weber computed the CDF \ for these S-scores, \ and created the remaining track display functions. Thanks to the Mouse\ Genome Sequencing Consortium for providing the mouse sequence data.\ \
\ \ compGeno 0 slamNonCodingHuman Slam Non Coding Human bed 5 Slam Predictions of Human/Mouse Conserved Non-Coding Regions 0 120 30 130 210 200 220 255 1 0 0\ Slam \ predicts coding exons and conserved noncoding regions in a pair of\ homologous DNA sequences, incorporating both statistical sequence properties\ and degree of conservation into predictions. This particular annotation uses the Nov. 2002 (hg13) assembly of the human genome. The model is symmetric and the\ same structure (with possibly different lengths) is predicted in both\ sequences.\ \ \
\ The CNS (conserved non-coding sequence) predictions are ab-initio\ predictions of conserved regions that do not fit in with a gene structure.\ Thus, slam is not simply trying to predict conserved regions to be coding,\ but is classifying such regions according to an overall probabilistic model\ of gene structure. The set of slam CNS predictions is therefore highly\ enriched for conserved non-coding regions.\ \
\ More information and a web server can be found at http://baboon.math.berkeley.edu/~syntenic/slam.html.\ \
\
This track displays alignments of the Dec. 2001 human genome vs.\ the mouse genome.
\The alignments were done with \ blat \ in translated protein mode\ using the parameters -q=dnax -t=dnax. The default\ settings were used otherwise. Both genomes were masked with \ RepeatMasker and Tandem Repeat \ Finder before running blat. Places where more than 250 alignments occurred\ over the same place were filtered out. Beware of alignments of greater than \ 97% identity: these may reflect mouse contamination in the human genome or\ human contamination in the mouse genome.
\This track is produced at UCSC. Mouse sequence data are provided by the \ Mouse Genome Sequencing Consortium. \ compGeno 1 colorChromDefault off\ otherDb hg10\ syntenyHuman Human Synteny bed 4 + Human/Mouse Synteny Using Blastz Single Coverage (100k window) 0 127 0 100 0 255 240 200 0 0 0
\ This track shows syntenous (corresponding) regions between human and mouse chromosomes. \
\ We passed a 100k non-overlapping window over the genome and - using the blastz best in mouse \ genome alignments - looked for high-scoring regions with at least 40% of the bases aligning \ with the same region in mouse. 100k segments were joined together if they agreed in direction and\ were within 500kb of each other in the human genome and within 4mb of each other in the mouse. \ Gaps were joined between syntenic anchors if the bases between two flanking regions agreed with \ synteny (direction and mouse location). Finally, we extended the syntenic block to include those \ areas.
\
\ Contact Robert Baertsch at UCSC for more information about this track.\ Thanks to the Mouse Genome Sequencing Consortium for providing the mouse sequence data. \ compGeno 1 snpNih Overlap SNPs bed 4 . Simple Nucleotide Polymorphisms (SNPs) from Clone Overlaps 0 144 0 0 0 127 127 127 0 0 0
This track shows locations of Simple Nucleotide Polymorphisms\ detected primarily by looking at overlaps between clones that cover\ the same region of the genome.
\ \The SNPs in this track include all of the polymorphisms that can be\ mapped against the current assembly. These include known point\ mutations (Single Nucleotide Polymorphisms), insertions, deletions,\ and segmental mutations from the current build of \ dbSNP. \
\\ There are three major cases that are not mapped and/or annotated:\
The heuristics for the non-SNP variations (i.e. named elements and\ STRs) are quite conservative; therefore, some of these are probably lost. This\ approach was chosen to avoid false annotation of variation in\ inappropriate locations.
\ \Thanks to the SNP\ Consortium and NIH's dbSNP for providing these data.
\ varRep 1 rmsk RepeatMasker rmsk Repeating Elements by RepeatMasker 1 147 0 0 0 127 127 127 1 0 0\ This track was created by using Arian Smit's RepeatMasker program, which screens DNA sequences \ for interspersed repeats and low complexity DNA sequences. The program\ outputs a detailed annotation of the repeats that are present in the \ query sequence, as well as a modified version of the query sequence \ in which all the annotated repeats have been masked. RepeatMasker uses \ the RepBase library of repeats from the \ Genetic \ Information Research Institute (GIRI). RepBase is described in \ Jurka, J. "Repbase Update: a database and an electronic journal of \ repetitive elements". Trends Genet. 9:418-420 (2000).\
\ In full display mode, this track displays nine different classes of repeats:\
\ UCSC has used the most current versions of the RepeatMasker software \ and repeat libraries available to generate this data. Note that these \ versions may be newer than those that are publicly available on the Internet. \
\ Data is generated using the RepeatMasker -s flag. Additional flags\ may be used for certain organisms. Repeats are soft-masked. Alignments may \ extend through repeats, but are not permitted to initiate in them. \ See the \ FAQ for \ more information. \ \
\ This track displays simple tandem repeats (possibly imperfect) located\ by Tandem Repeats\ Finder, which is specialized to this purpose. These repeats can\ occur within coding regions of genes and may be quite\ polymorphic. Repeat expansions are sometimes associated with specific\ diseases.
\ \\ For more information about the Tandem Repeats Finder, see G. Benson, "Tandem repeats finder: a program to analyze DNA sequences", Nucleic Acids \ Research, 1999, 27(2) 573-580.
\ \\ Tandem Repeats Finder was written by Gary Benson.
\ \ varRep 1