mapGenethon STS Markers bed 5 + Various STS Markers 1 2 0 0 0 127 127 127 0 0 0 map 1 stsMarker STS Markers bed 5 + STS Markers on Genetic (blue), FISH (green) and RH (black) Maps 1 3 0 0 0 128 128 255 0 0 0

Description

\

This track shows locations of Sequence Tagged Sites (STSs)\ along the draft assembly.

\ \

Method

\ These STSs have been mapped using \ either genetic (Genethon and Marshfield maps),\ radiation hybridization (the Stanford, Whitehead RH, and GeneMap99 maps) or\ YAC mapping (the Whitehead YAC map) techniques. \ Prior to August 2001, this track also\ showed approximate position of FISH mapped clones.\ Starting in August 2001, the FISH clones have been separated into a new\ track of their own.

\ \

Using the Filter

\

The track filter can be used to change the color or include/exclude a map data set \ within the track. This is helpful when many items are shown in the track\ display, especially when only some are relevant to the current task. To use the\ filter:\

    \
  1. In the pulldown menu, select the map whose data you would like to highlight or exclude in the display. By default, the "All Genetic" option is selected.\
  2. Choose the color or display characteristic that will be used to highlight or\ include/exclude the filtered items. If "exclude" is chosen, the browser will not\ display data from the map selected in the pulldown list. If "include" is selected, the browser\ will display only data from the selected map.\
  3. When you have finished configuring the filter, click the Submit button.\

\ \

Credits

\

Many thanks to the researchers who worked on these\ maps, and to Greg Schuler, Arek Kasprzyk, Wonhee Jang,\ Terry Furey and Sanja Rogic for helping\ process the data. Additional data on the individual maps can be\ found at the following links:

\ \ \ map 1 stsMap STS Markers bed 5 + STS Markers on Genetic (blue) and Radiation Hybrid (black) Maps 1 4 0 0 0 128 128 255 0 0 0

Description

\

This track shows locations of STSs (Sequence Tagged Sites)\ along the draft assembly. These STSs have been mapped using \ either genetic (Genethon, Marshfield, and deCODE maps),\ radiation hybridization (the Stanford, Whitehead RH, and GeneMap99 maps) or\ YAC mapping (the Whitehead YAC map) techniques. \ Prior to August 2001, this track also\ shows the approximate position of FISH mapped clones.\ Starting in August 2001, the FISH clones have separated into a new\ track of their own.

\

Genetic map markers are shown in blue, and radiation hybrid map markers are shown \ in black. When a marker maps to multiple positions in the genome, it's shown in a \ lighter color.

\ \

Using the Filter

\

The track filter can be used to change the color or include/exclude a set of map data \ within the track. This is helpful when many items are shown in the track\ display, especially when only some are relevant to the current task. To use the\ filter:\

    \
  1. In the pulldown menu, select the map whose data you would like to highlight or exclude in the display. By default, the "All Genetic" option is selected.\
  2. Choose the color or display characteristic that will be used to highlight or\ include/exclude the filtered items. If "exclude" is chosen, the browser will not\ display data from the map selected in the pulldown list. If "include" is selected, the browser\ will display only data from the selected map.\
  3. When you have finished configuring the filter, click the Submit button.\

\ \

Credits

\

Many thanks to the researchers who worked on these\ maps, and to Greg Schuler, Arek Kasprzyk, Wonhee Jang,\ Terry Furey and Sanja Rogic for helping\ process the data. Additional data on the individual maps can be\ found at the following links:

\ \ \ map 1 stsMapMouse STS Markers bed 5 + STS Markers on Genetic Maps 1 5 0 0 0 128 128 255 0 0 0

This track shows locations of STS (Sequence-Tagged Site) markers along \ the draft assembly. These STSs appear on the MGI consensus mouse genetic \ map. Information about the genetic map and STS marker primer sequences are \ provided by the Mouse Genome Informatics database group at The Jackson \ Laboratory.

\ map 1 fishClones FISH Clones bed 5 + Clones placed on Cytogenetic Map using FISH 0 6 0 150 0 127 202 127 0 0 0

Description

\

This track shows the location of FISH mapped clones along the \ draft assembly sequence. The locations of these clones were\ contributed as a part of The BAC Resource Consortium \ \ "Integration of cytogenetic landmarks into the draft sequence of the\ human genome", Nature 409:953-958, Feb. 2001.

\ \

More information about the BAC clones, including how they can be\ obtained, can be found at the \ Human BAC Resource\ and the Clone Registry\ web sites hosted by NCBI.\ On the details page for a specific clone, you can view the clone's entry in the Clone Registry by clicking on the clone name\ at the top of the page.

\ \

Using the Filter

\

The track filter can be used to change the color or include/exclude a dataset \ from an individual lab within the track. This is helpful when many items are shown in the track\ display, especially when only some are relevant to the current task. To use the\ filter:\

    \
  1. In the pulldown menu, select the lab whose data you would like to highlight or exclude in the display. \
  2. Choose the color or display characteristic that will be used to highlight or\ include/exclude the filtered items. If "exclude" is chosen, the browser will not\ display clones from the lab selected in the pulldown list. If "include" is selected, the browser\ will display clones only from the selected lab.\
  3. When you have finished configuring the filter, click the Submit button.\

\ \

Credits

\

We would like to thank all of the labs that have contributed to this resource:\

\ \ \ map 1 genMapDb GenMapDB Clones bed 6 + GenMapDB BAC Clones 0 7 0 0 0 127 127 127 0 0 0

BAC clones from the \ GenMapDB database\ are placed on the draft sequence using BAC end sequence information\ and confirmed using STS markers by Vivian Cheung's lab at the\ Department of Pediatrics, University of Pennsylvania. Further\ information about each clone can be obtained by clicking on the clone\ name on the track detail page.\ map 1 recombRate Recomb Rate bed 4 + Recombination Rate based on deCODE, Marshfield, or Genethon map (default - deCODE) 0 8 0 0 0 127 127 127 0 0 0

Description

The recombination rate track represents\ calculated sex-averaged rates of recombination based on either the\ deCODE, Marshfield, or Genethon genetic maps. By default, the deCODE\ map rates are displayed. Female and Male specific recombination\ rates, and well as rates from the Marshfield and Genethon maps, can\ also be displayed by choosing the appropriate option on the track user\ interface page.\

\ \

Methods

\

The deCODE genetic map was created at \ deCODE Genetics and is \ based on 5,136 microsatellite markers for 146 families with a total\ of 1,257 meiotic events. For more information on this map, please see\ "A high resolution recombination map of the human genome", \ Nature Genetics, 31(3), pages 241-247 (2002).\

\

The Marshfield genetic map was created at the Center\ for Medical Genetics and is based on 8,325 short tandem repeat\ polymorphisms (STRPS) for 8 CEPH families consisting of 134\ individuals with 186 meioses. For more information on this map,\ please see KW Broman, JC Murray, VC Sheffield, RL White and JL Weber,\ "Comprehensive\ human genetic maps: Individual and sex-specific variation in\ recombination", American Journal of Human Genetics\ 63:861-689 (1998).\

\

The Genethon genetic map was created at Genethon and is\ based on 5,264 microsatellites for 8 CEPH families consisting of 134\ individuals with 186 meioses. For more information on this map,\ please see Dib, et al, "A comprehensive genetic map of the human genome\ based on 5,264 microsatellites", Nature, 380, pages 152-154\ (1996).\

\ \

Each base is assigned the recombination rate calculated by\ assuming a linear genetic distance across the immediately flanking\ genetic markers. The recombination rate assigned to each 1Mb windows\ is the average recombination rate of the bases contained within the\ window.\

\ \

Credits

\ This track is produced at UCSC and uses data that is freely available for\ the Genethon, Marshfield, and deCODE genetic maps (see above links). Thanks\ to all who have played a part in the creation of these maps. \ map 1 ctgPos Map Contigs Physical Map Contigs 0 9 150 0 0 202 127 127 0 0 0

Description

\ The map contigs track shows the locations of contigs of clones\ on the physical map. \ \

Method

\ In assembly versions before the August 6 2001\ freeze this track was based on the Washington University accession\ map, which in turn was based on a fingerprint contig (FPC) map\ described in 'A\ physical map of the human genome'\ in Nature volume 409 pages 934-941. Starting with the August\ 6 2001 freeze this track is based on tiling path (TPF) maps curated\ by the sequencing centers responsible for each chromosome. Imre\ Vastrik at the European Bioinformatics Institute merges the TPF\ maps with the FPC map, favoring the TPF map where there are conflicts.\ This step increases the clone coverage substantially over that in\ the TPF maps. This merged map is then used as the basis for the\ UCSC assemblies. The clone contigs in this merged map are shown in\ this track.\ map 0 gold Assembly bed 3 + Assembly from Fragments 0 10 150 100 30 230 170 40 0 0 0

Description

\

This track shows the draft assembly of the $organism genome.\ This assembly merges contigs from overlapping draft and\ finished clones into longer sequence contigs. The sequence\ contigs are ordered and oriented when possible by mRNA, EST,\ paired plasmid reads (from the SNP Consortium) and BAC end\ sequence pairs.

\

In dense mode, this track depicts in alternating gold and\ brown the path through the draft and finished clones (aka the\ golden path) used to create the assembled sequence. Where gaps\ exist in the path, spaces are shown between the gold and brown\ blocks. If the relative order and orientation of the contigs\ between the two blocks is known, a line is drawn to bridge the\ blocks.

\ \ map 1 gap Gap bed 3 + Gap Locations 1 11 0 0 0 127 127 127 0 0 0

Description

\ This track depicts gaps in the assembly. These gaps - with the\ exception of intractable heterochromatic gaps - will be closed during the\ finishing process. \

\ Gaps are represented as black boxes in this track.\ If the relative order and orientation of the contigs on either side\ of the gap is known, it is a 'bridged' gap and a white line is drawn \ through the black box representing the gap. There are four principle types of gaps:\

\ map 1 partMrnas Partially Found mRNAs psl . Partially Found RefSeq and MGC mRNAs 0 12 0 0 0 127 127 127 0 0 0 map 1 missingHg Missing Human psl . unplaced human reqseq genes blatted against mouse translated 1 13 0 100 0 255 240 200 0 0 0 map 1 clonePos Coverage Clone Coverage/Fragment Position 0 14 0 0 0 180 180 180 0 0 0

Description

\

\ In dense mode, this track shows the coverage level of \ the genome. Finished regions are shown in black. Draft regions \ are shown in various shades of gray that correspond\ to the coverage. \

\ In full mode, this track shows the position of each contig inside each draft \ or finished clone ('fragment') in the assembly. For some assemblies,\ clones in the sequencing center tiling path are shown with\ blue rather than gray backgrounds.\

\ map 0 bacEndPairs BAC End Pairs bed 6 + BAC End Pairs 0 15 0 0 0 127 127 127 0 0 0

Description

\

A valid pair of BAC end sequences must be\ at least 50Kb but no more than 600Kb away from each other. \ The orientation of the first BAC end sequence must be "+" and\ the orientation of the second BAC end sequence must be "-".

\ \

Methods

\

BAC end sequences are placed on the assembled sequence using\ Jim Kent's BLAT program.

\ \

Credits

\

Additional information about the clone including how it\ can be obtained may be found at the \ NCBI Clone Registry.\ You can view the entry for this clone in the registry by clicking on the clone name\ at the top of the page.\

\ map 1 bacEndPairsBad Orphan BAC End Pairs bed 6 + Orphan and Incorrectly Oriented BAC End Pairs 0 16 0 0 0 127 127 127 0 0 0 map 1 bacEndPairsLong Long BAC End Pairs bed 6 + Long BAC End Pairs 0 17 0 0 0 127 127 127 0 0 0 map 1 fosEndPairs Fosmid End Pairs bed 6 + Fosmid End Pairs 0 18 0 0 0 127 127 127 0 0 0

Description

\

A valid pair of fosmid end sequences must be\ at least 20Kb but no more than 100Kb away from each other. \ The orientation of the first fosmid end sequence must be "+" and\ the orientation of the second fosmid end sequence must be "-".

\ \

Methods

End sequences were trimmed based on quality\ scores such that the resulting sequence is the longest contiguous\ stretch of bases all with quality scores of 19 or above. resulting\ sequences must be at least 15 base pairs. Trimmed fosmid end\ sequences are placed on the assembled sequence using Jim Kent's BLAT\ program.

\ \

Credits

\

Sequencing of the fosmid ends was done at the Whitehead Institute.\ Sequeces and quality scores are available in the trace respository at\ the NCBI.\

\ map 1 fosEndPairsBad Orphan Fosmid End Pairs bed 6 + Orphan and Incorrectly Oriented Fosmid End Pairs 0 19 0 0 0 127 127 127 0 0 0 map 1 fosEndPairsLong Long Fosmid End Pairs bed 6 + Long Fosmid End Pairs 0 20 0 0 0 127 127 127 0 0 0 map 1 chr18deletions Chr18 Deletions bed 6 + Chromosome 18 Deletions 0 21 0 0 0 127 127 127 0 0 0 map 1 jaxQTL2 QTL bed 8 + Quantitative Trait Loci From Jax/MGI 0 21.1 200 100 0 227 177 127 0 0 0 http://www.informatics.jax.org/searches/accession_report.cgi?id=$$

Description

\

\ Approximate positions of quantititive\ trait loci based on reported peak LOD scores taken\ from\ Jackson Lab's\ \ Mouse Genome Informatics Database (MGI).\

\

Credits

\

\ Thanks to Carol Bult for providing the QTL data. \

\ map 1 jaxQTL QTL bed 6 + Quantitative Trait Loci From Jax/MGI 0 21.1 200 100 0 227 177 127 0 0 0 http://www.informatics.jax.org/searches/accession_report.cgi?id=$$

Description

\

\ Approximate positions of quantititive\ trait loci based on reported peak LOD scores taken\ from\ Jackson Lab's\ \ Mouse Genome Informatics Database (MGI).\

\

Credits

\

\ Thanks to Carol Bult for providing the QTL data. \

\ map 1 isochores Isochores bed 4 + GC-rich (dark) and AT-rich (light) Isochores 0 22 0 0 0 127 127 127 1 0 0

What's an Isochore

\

Isochores describe a region of a chromosome where the CG-content is\ either higher or lower than the whole genome average (42%). A CG-rich\ isochore is given a dark color, while a CG-poor isochore is a light\ color.

\

Isochores were determined by first calculating the CG-content of 100,000bp\ windows across the genome. These windows were either labeled H or L\ depending on whether the window contained a higher or lower GC-content\ than average. A two-state HMM was created in which one state represented\ GC-rich regions, and the other GC-poor. It was trained using the first 12\ chromosomes. The trained HMM was used to generate traces over all chromosomes.\ These traces define the boundaries of the isochores,\ and their type (GC-rich or AT-rich).

\ map 1 gcPercent GC Percent bed 4 + Percentage GC in 20,000 Base Windows 0 23 0 0 0 127 127 127 1 0 0

Description

\ The GC percent track shows the percentage of bases that are G or C in\ a 20,000 base window. Windows with high GC content are drawn more darkly than windows\ with low GC content. High GC content is associated with gene rich areas.\ map 1 gcPercentSmall GC % 100b bed 4 + Percentage GC in 100 Base Windows 0 24 0 0 0 127 127 127 1 0 0 map 1 GCwiggle GC Samples sample GC Percent Sample Track (every 20,000 bases) 0 25 0 0 0 127 127 127 0 0 1 chr22, map 0 pGC GC Samples sample GC Percent Sample Track 0 26 0 0 0 127 127 127 0 0 0 map 0 genomicSuperDups filtered WGAC bed 6 Duplications of >1000 Bases of non-repeatMasked Sequence 0 27 0 0 0 127 127 127 0 0 0

Description

\ \

This region was detected as a putative genomic duplication within the golden path.\ Orange, yellow, dark-light gray represent similarities of >99\\%, 99-98\\% and 98-90% \ respectively. Duplications greater than 98% similarity that lack sufficient SDD \ evidence (likely missed overlaps) are shown as red.Cut off values were at least \ 1 kb of total sequence aligned (containing at least 500 bp non-RepeatMasked sequence) \ and at least 90% sequence identity. For a description of the 'fuguization' detection \ method see Bailey, et al (2001) Genome Res 11:1005-17. \ The data was provided by Jeff Bailey \ \ and Evan Eichler.\

map 1 humanParalog Human Paralog bed 5+ Human Paralogs using Fgenesh++ Gene Predictions 0 28 0 100 0 255 240 200 1 0 0 map 1 celeraCoverage WSSD Coverage bed 6 + Regions Assayed for SDD 0 29 0 0 0 127 127 127 0 0 0

Description

\ \

This track represents coverage of clones that were assayed for duplications with Celera reads. Absent regions were not assessed by this version of the SDD. For a description of the 'fuguization' detection \ method see Bailey, et al (2001) Genome Res 11:1005-17. \ The data was provided by Jeff Bailey \ \ and Evan Eichler.\

\ map 1 celeraDupPositive WSSD Duplication bed 6 + Sequence identified as duplicate by high depth Celera Reads 0 30 0 0 0 127 127 127 0 0 0

Description

\ \

This region shows similarity > 90% and >250 bp of repeatmasked sequence to sequencesin the Segmental Duplication Database (SDD).For a description of the 'fuguization' detection \ method see Bailey, et al (2001) Genome Res 11:1005-17. \ The data was provided by Jeff Bailey \ \ and Evan Eichler.\

map 1 genomicDups Duplications bed 6 + Duplications of >1000 Bases Sequence 0 31 170 0 0 160 150 0 0 0 0 This region was detected as a genomic duplication within the golden path. \ Duplications of 99% or greater similarity, which are likely missed overlaps, \ are shown as red. Duplications of 98% - 99% similarity are shown as yellow. \ Duplications of 90% - 98% similarity are shown as shades of gray. Cut off \ values were at least 1 kb of total sequence aligned (containing at least 500 bp \ non-RepeatMasked sequence) and at least 90% sequence identity. For a \ description of the 'fuguization' detection method see \ Bailey, et al (2001) Genome Res 11:1005-17. \ The data was provided by Jeff Bailey and Evan Eichler.\
\ map 1 dupes Duplications bed 6 . Duplications of >98% Identity >1kb 1 32 0 0 0 127 127 127 0 0 0 map 1 genieKnown Known Genes genePred Known Genes (from full length mRNAs) 3 33 20 20 170 137 137 212 0 0 0 genes 1 knownGene Known Genes genePred refPep refMrna Known Genes based on SWISS-PROT, TrEMBL, and mRNA 3 34 12 12 120 133 133 187 0 0 0

Description

\ The Known Genes track shows known protein coding genes based on \ proteins from SWISS-PROT, TrEMBL, and TrEMBL-NEW and their\ corresponding mRNAs from Genbank.\ Coding exons are displayed \ taller than 5' and 3' untranslated regions (UTR). Connecting introns \ are one-pixel lines with hatch marks indicating direction of transcription.\ Entries which have corresponding entries in PDB are colored black.\ Entries which either have corresponding proteins in SWISS-PROT or mRNAs that are \ NCBI Reference Sequences with a "Reviewed" status are colored dark blue.\ Entries which have mRNAs that are \ NCBI Reference Sequences with a "Provisional" status are colored lighter blue.\ Everything else is colored with lightest blue.\

Method

\ First, all mRNAs of a species are aligned against the genome using the BLAT\ program. When a single mRNA aligns in multiple places, only\ the best alignments are kept. The alignments must also have \ at least 98% sequence identity to be kept. \ This set of mRNA alignments is further reduced by keeping only those mRNAs that \ are referenced by a protein in SWISS-PROT, TrEMBL, or TrEMBL-NEW.\

\ Among multiple mRNAs referenced by a single protein, the best mRNA is chosen based on \ a quality score, which depends on its length, how good its translation matches \ the protein sequence, and its release date.\

\ Finally the list of mRNA and protein pairs are further cleaned up by removing \ short invalid entries and consolidating entries with identical CDS regions.\

Credits

\ The Known Genes track is produced at UCSC based on cross-references between proteins \ from SWISS-PROT \ (also including TrEMBL and TrEMBL-NEW) and mRNAs from Genbank\ generated by scientists worldwide.\ genes 1 refGene RefSeq Genes genePred refPep refMrna RefSeq Genes 3 35 12 12 120 133 133 187 0 0 0

Description

\ The RefSeq Genes track shows known protein coding genes taken from mRNA \ reference sequences compiled at LocusLink. Coding exons are displayed \ taller than 5' and 3' untranslated regions (UTR). Connecting introns \ are one-pixel lines with hatch marks indicating direction of transcription.\ Non-coding RNA genes have their own track in some assemblies.\

Method

\ Refseq mRNAs are aligned against the genome using the BLAT\ program. When a single mRNA aligns in multiple places only\ the best alignments are kept. The alignments must also have \ at least 98% sequence identity to be kept.\

Credits

\ The RefSeq Genes track is produced at UCSC from mRNA sequence data\ generated by scientists worldwide and curated by the \ RefSeq project \ at NCBI.\ genes 1 genieAlt AltGenie genePred genieAltPep Genie Gene Predictions from Affymetrix 1 39 125 0 150 190 127 202 0 0 0

Description

\

Genie predictions are based on \ Affymetrix's \ Genie gene finding software. Genie is a generalized HMM \ which accepts constraints based on mRNA and EST data.

\ genes 1 ensGene Ensembl Genes genePred ensPep Ensembl Gene Predictions 1 40 150 0 0 202 127 127 0 0 0 http://www.ensembl.org/Mus_musculus/transview?transcript=$$

Description

\ These gene predictions are from Project Ensembl.\ \

Methods

\

For a description of the methods used in Ensembl gene prediction, refer to \ \ The Ensembl genome database project, Nucleic Acids Research, \ 2002, 30(1) 38-41.

\ \

Credits

\ Thanks to the Project Ensembl for providing this annotation.\ \ genes 1 acembly Acembly Genes genePred acemblyPep acemblyMrna Acembly Gene Predictions With Alt-splicing 1 41 155 0 125 205 127 190 0 0 0 http://www.ncbi.nlm.nih.gov/AceView/av.cgi?db=human&l=$$

Description

\

This track shows gene models reconstructed solely from\ mRNA and EST evidence by Danielle and Jean Thierry-Mieg\ and Vahan Simonyan using the Acembly program.

\ \

Methods

\

Acembly attempts to find the best alignment of each mRNA against the \ genome, and considers alternative splice models. If more than one gene \ model is produced that has statistical significance, all of these models \ are displayed.

\ \

Credits

\

Thanks to Jean Thierry-Mieg at NIH for \ providing this track.

\ \ \ genes 1 ensEst Ensembl ESTs genePred ensEstPep $Organism ESTs From Ensembl 1 42 0 0 0 127 127 127 0 0 0

Description

\ ESTs from Project Ensembl.\ \

Methods

\

For a description of the methods used, refer to \ \ The Ensembl genome database project, Nucleic Acids Research, \ 2002, 30(1) 38-41.

\ \

Credits

\ Thanks to the Project Ensembl for providing this annotation.\ \ genes 1 ncbiGenes NCBI Gene Models genePred ncbiPep $Organism Gene Models from NCBI 1 43 0 0 0 127 127 127 0 0 0

Description & Credits

\ \ Gene predictions from \ NCBI . \ See the human build \ \ release notes \ for a description of the build. \ genes 1 npredGene NCBI Prediction genePred npredPep NCBI Gene Predictions 0 44 170 100 0 212 177 127 0 0 0 http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=$$

Description

\

This track shows predictions from NCBI Genome\ Assembly/Annotation Projects.\ \

Methods

\ Methods details goes here.\

Credits

\ Thanks to NCBI.\ \ genes 1 twinscan Twinscan genePred twinscanPep Twinscan Gene Predictions Using Mouse/Human Homology 1 45 0 100 100 0 50 50 0 0 0

Description & Credits

\ \ Twinscan predicts genes in a manner similar to Genscan, except that\ Twinscan takes advantage of genome comparison to improve gene prediction\ accuracy. More information and a web server can be found at\ \ http://genes.cs.wustl.edu/.\ \

\ The Twinscan algorithm is described in Korf, I., P. Flicek, D. Duan, and M.R. Brent. \ 2001. Integrating genomic homology into gene structure prediction. \ Bioinformatics 17:S140-148.\

\ \ genes 1 genomeScan NCBI GenomeScan genePred genomeScanPep $Organism GenomeScan Models from NCBI 1 46 0 0 0 127 127 127 0 0 0

Description & Credits

\ \ Pure GenomeScan gene predictions from \ NCBI .\ See the human build \ \ release notes \ for a description of the build. \ genes 1 sgpGene SGP Genes genePred sgpPep SGP Gene Predictions Using Mouse/Human Homology 1 47 0 90 100 127 172 177 0 0 0

Description

\

\ This track shows gene predictions from the SGP program, which is being developed at \ the Grup de Recerca en\ Informàtica Biomèdica (GRIB) at Institut Municipal d'Investigació Mèdica (IMIM) in \ Barcelona. To predict genes in a genomic\ query, SGP combines geneid predictions with tblastx comparisons of the genomic query against other genomic sequences.\ \ \ \ genes 1 softberryGene Fgenesh++ Genes genePred softberryPep Fgenesh++ Gene Predictions 0 48 0 100 0 127 177 127 0 0 0

Description

\

Fgenesh++ predictions are based on Softberry's gene finding software.

\ \

Methods

\ Fgenesh++ uses both HMMs and protein similarity to find genes in a completely \ automated manner. For more information, see the paper Solovyev V.V. (2001) \ "Statistical approaches in Eukaryotic gene prediction" in the Handbook of \ Statistical Genetics (ed. Balding D. et al.), John Wiley & Sons, Ltd., p. 83-127.\ \

Credits

\

The Fgenesh++ gene predictions were produced by \ Softberry Inc. \ Commercial use of these predictions is restricted to viewing in \ this browser. Please contact Softberry Inc. to make arrangements for further commercial access.\ \ genes 1 geneid Geneid Genes genePred geneidPep Geneid Gene Predictions 0 49 0 90 100 127 172 177 0 0 0 This track shows gene predictions from the geneid program, \ which is being developed at the Grup de Recerca en Informatica Biomèdica \ at Institut Municipal d'Investigació Mèdica (IMIM) in Barcelona. Geneid \ uses information from sequence signals involved in genes \ specification as well as coding statistics to define exons and genes.\ genes 1 genscan Genscan Genes genePred genscanPep Genscan Gene Predictions 1 50 170 100 0 212 177 127 0 0 0

Description

\

This track shows predictions from Chris Burge's program Genscan.\ \

Methods

\ For a description of the Genscan program and the model that underlies it, refer\ to Burge, C. and Karlin, S. (1997) \ \ Prediction of complete gene structures in human genomic DNA. \ J. Mol. Biol. 268, 78-94. The splice site models used are described in \ more detail in Burge, C. B. (1998) Modeling dependencies in pre-mRNA splicing \ signals. In Salzberg, S., Searls, D. and Kasif, S., eds. \ \ Computational Methods in Molecular Biology, Elsevier Science, Amsterdam, \ 127-163. \ \

Credits

\ Thanks to Chris Burge for providing this data.\ genes 1 genscanExtra Genscan Extra bed 6 Genscan Extra (Suboptimal) Exon Predictions 0 51 180 90 0 217 172 127 0 0 1 chr22, genes 1 rnaGene RNA Genes bed 6 + Non-coding RNA Genes (dark) and Pseudogenes (light) 0 52 170 80 0 230 180 130 0 0 0

Description

\ This track shows the location of non-protein coding RNA genes and\ pseudo-genes. This data was kindly provided by\ Sean Eddy at Washington University. \

\ Feature types include:\

\ \

Methods

\ \

Eddy-tRNAscanSE (tRNA genes, Sean Eddy):

\ tRNAscan-SE 1.23 with default parameters.\ Score field contains tRNAscan-SE bit score; >20 is good, >50 is great.\ \

Eddy-BLAST-tRNAlib (tRNA pseudogenes, Sean Eddy):

\ WUBLAST 2.0, with options "-kap wordmask=seg B=50000 W=8 cpus=1".\ Score field contains % identity in BLAST-aligned region.\ Used each of 602 tRNAs and pseudogenes predicted by tRNAscan-SE\ in the human oo27 assembly as queries. Kept all nonoverlapping\ regions that hit one or more of these with P <= 0.001. \ \

Eddy-BLAST-snornalib (known snoRNAs and snoRNA pseudogenes, Steve Johnson):

\ WUBLASTN 2.0, with options "-V=25 -hspmax=5000 -kap wordmask=seg \ B=5000 W=8 cpus=1".\ Score field contains BLAST score.\ Used each of 104 unique snoRNAs in snorna.lib as a query.\ Any hit >=95% full length and >=90% identity is annotated as a\ "true gene".\ Any other hit with P <= 0.001 is annotated as a "related sequence" \ and interpreted as a putative pseudogene.\ \

Eddy-BLAST-otherrnalib \ (non-tRNA, non-snoRNA noncoding RNAs with Genbank entries\ for the human gene.):

\ WUBLASTN 2.0 [15 Apr 2002]\ with options: "-kap -cpus=1 -wordmask=seg -W=8 -E=0.01 -hspmax=0\ -B=50000 -Z=3000000000". Exceptions to this are:\ \ The score field contains the BLASTN score. \ Used 41 unique miRNAs, and 29 other ncRNAs as queries.\ Any hit >=95% full length and >=95% identity is annotated as a \ "true gene".\ Any other hit with P <= 0.001 and >= 65% identity is annotated\ as a "related sequence". An exception to this is: all miRNAs consist \ \ of 16-26bp sequences in Genbank \ and are only annotated if 100% full length and 100% identity. \ miRNAs consist of Let-7 from Pasquinelli et al., \ Nature (2000) 408:86; 40 from Mourelatos et al., Gene & Dev (2002) \ 16:720. \

Credits

\ This data was kindly provided by Sean Eddy at Washington University.\ genes 1 superfamily Superfamily bed 4 + Superfamily/SCOP: Proteins having homologs with known structure/function 1 53 150 0 0 202 127 127 0 0 0 http://supfam.org/SUPERFAMILY/cgi-bin/gene.cgi?seqid=$$

Description

\ The \ Superfamily \ track shows proteins having homologs with known structures or functions.\

\ Each entry on this track shows the coding region of a gene (based on Ensembl gene prediction).\ The label,\ shown in Full mode at the left hand side of each entry, \ consists of the names of \ all known protein domains coded by this gene. This \ usually contains structural and/or function descriptions. They provide valuable information for \ our users to get a quick grasp of the biological significance for each gene.\

Method

\ Data are downloaded from the Superfamily server.\ Using the cross-reference between Superfamily entries and Ensembl gene prediction entries\ and their alignment to the appropriate genome, the associated data are processed to generate \ a simple bed format UCSC Genome Browser track.\

Credits

\ Superfamily is developed by\ Julian\ Gough at the MRC Laboratory\ of Molecular Biology, Cambridge.\

\ \ Gough, J., Karplus, K., Hughey, R. and\ Chothia, C. (2001). "Assignment of Homology to Genome Sequences using a\ Library of Hidden Markov Models that Represent all Proteins of Known Structure." \ J. Mol. Biol., 313(4), 903-919.\ \ \ genes 1 mrna $Organism mRNAs psl . $Organism mRNAs from Genbank 3 54 0 0 0 127 127 127 1 0 0

Description

\ The $Organism mRNA track shows alignments between $organism mRNAs\ in Genbank and the genome. Aligning regions (usually exons)\ are shown as black boxes connected by lines for gaps (spliced\ out introns usually). In full display, arrows on the introns\ indicate the direction of transcription.\

Method

\ Genbank $organism mRNAs are aligned against the genome using the \ BLAT program. When a single mRNA aligns in multiple places, \ the alignment having the highest base identity is found. \ Only alignments that have a base identity level within 1% of\ the best are kept. Alignments must also have at least 95%\ base identity to be kept.\ \

Using the Filter

\

The track filter can be used to change the color or include/exclude a subset of individual \ items within a track. This is helpful when many items are shown in the track\ display, especially when only some are relevant to the current task. To use the\ filter:\

    \
  1. Enter a value in one or more of the text boxes to filter the mRNA display. For\ example, to apply the filter to all liver mRNAs, type "liver" in the \ tissue box. For a list of permissible filter values, consult the non-positional table in\ the Table Browser that corresponds to the factor on which you wish to filter. For\ example, the non-positional table "tissue" contains all of the types of tissues\ that can be entered into the tissue text box. Wildcards can also be used in the\ filter.\
  2. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only mRNAs that match all of the filter criteria will\ be highlighted. If "or" is selected, mRNAs that match any 1 of the filter criteria\ will be highlighted.\
  3. Choose the color or display characteristic that will be used to highlight or\ include/exclude the filtered items. If "exclude" is chosen, the browser will not \ display mRNAs that match the filter criteria. If "include" is selected, the browser \ will display only those mRNAs that match the filter criteria.\
  4. When you have finished configuring the filter, click the Submit button.\

\ \

Credits

\ The $Organism mRNA track is produced at UCSC from mRNA sequence data\ submitted to the international public sequence databases by \ scientists worldwide.\ rna 1 tightMrna Tight mRNAS psl . Tightly Filtered $Organism mRNAs from Genbank 0 55 0 0 0 127 127 127 1 0 0 rna 1 intronEst Spliced ESTs psl est $Organism ESTs That Have Been Spliced 1 56 0 0 0 127 127 127 1 0 0

Description

\

The Spliced EST track displays Expressed Sequence Tags \ (ESTs) from Genbank that show signs of splicing when\ aligned against the genome. By requiring splicing, the level \ of contamination in the EST databases is drastically reduced\ at the expense of eliminating many genuine 3' ESTs.\ For a display of all ESTs (including unspliced), see the \ $Organism EST track.

\ \

Expressed sequence tags are single read (typically\ approximately 500 base) sequences which usually\ represent fragments of transcribed genes. Aligning \ regions (usually exons) are shown as black boxes \ connected by lines for gaps (usually spliced out introns). \ In full display mode, arrows on the introns\ indicate the direction of transcription. In the\ December 2001 assembly and later, this direction is\ taken by looking at the splice sites. In previous\ assemblies, the direction of transcription was taken from \ the Genbank annotations, which frequently were inaccurate.\ \

Method

\

To make an EST, RNA is isolated from cells and reverse\ transcribed into cDNA. Typically, the cDNA is cloned\ into a plasmid vector, and a read taken from the 5'\ and/or 3' primer. For most - but not all - ESTs, the\ reverse transcription is primed by an oligo-dT, which\ hybridizes with the poly-A tail of mature mRNA. The\ reverse transcriptase may or may not make it to the 5'\ end of the mRNA, which may or may not be degraded.

\ \

In general, the 3' ESTs mark the end of transcription\ reasonably well, but the 5' ESTs may end at any point\ within the transcript. Some of the newer cap-selected\ libraries are starting to hit transcription start\ reasonably well. Before the cap-selection techniques\ emerged, some projects used random rather than poly-A\ priming in an attempt to get sequence distant from the\ 3' end. These projects were successful at this, but as\ a side effect also deposited sequences from unprocessed\ mRNA and perhaps even genomic sequences into the EST databases.\ (Even outside of the random-primed projects, there is a\ degree of non-mRNA contamination.) Because of this, a\ single unspliced EST should be viewed with considerable\ skepticism. However, because the $organism 3' UTRs are quite\ long, the splicing requirement does eliminate many genuine 3'\ ESTs.

\ \

To generate this track, $organism ESTs from Genbank are aligned \ against the genome using the BLAT program. Note that the maximum intron length\ allowed by BLAT is 500,000 bases, which may eliminate some ESTs with very \ long introns that might otherwise align. When a single \ EST aligns in multiple places, the alignment having the \ highest base identity is found. Only alignments that have \ a base identity level within 1% of the best are kept. \ Alignments must also have at least 93% base identity to be kept.

\ \

Using the Filter

\

The track filter can be used to change the color or include/exclude a subset of \ individual items within a track. This is helpful when many items are shown in the \ track display, especially when only some are relevant to the current task. To use the\ filter:\

    \
  1. Enter a value in one or more of the text boxes to filter the EST display. For\ example, to apply the filter to all ESTs expressed in the liver, type "liver" in the \ tissue box. For a list of permissible filter values, consult the non-positional table in\ the Table Browser that corresponds to the factor on which you wish to filter. For\ example, the non-positional table "tissue" contains all of the types of tissues\ that can be entered into the tissue text box. Wildcards can also be used in the\ filter.\
  2. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only ESTs that match all of the filter criteria will\ be highlighted. If "or" is selected, ESTs that match any 1 of the filter criteria\ will be highlighted.\
  3. Choose the color or display characteristic that should be used to highlight or\ include/exclude the filtered items. If "exclude" is chosen, the browser will not \ display ESTs that match the filter criteria. If "include" is selected, the browser \ will display only those ESTs that match the filter criteria.\
  4. When you have finished configuring the filter, click the Submit button.\

\ \

Credits

\ The Spliced EST track is produced at UCSC from EST sequence data\ submitted to the international public sequence databases by \ scientists worldwide.\ rna 1 est $Organism ESTs psl est $Organism ESTs Including Unspliced 0 57 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows alignments between $organism Expressed\ Sequence Tags (ESTs) in Genbank and the genome.

\ \

Expressed sequence tags are single read (typically\ approximately 500 base) sequences which usually\ represent fragments of transcribed genes. Aligning \ regions (usually exons) are shown as black boxes \ connected by lines for gaps (usually spliced out introns). \ In full display mode, arrows on the introns\ indicate the direction of transcription. In the\ December 2001 assembly and later, this direction is\ taken by looking at the splice sites. In previous\ assemblies, the direction of transcription was taken from \ the Genbank annotations, which frequently were inaccurate.\ \

Method

\

To make an EST, RNA is isolated from cells and reverse\ transcribed into cDNA. Typically, the cDNA is cloned\ into a plasmid vector, and a read taken from the 5'\ and/or 3' primer. For most - but not all - ESTs, the\ reverse transcription is primed by an oligo-dT, which\ hybridizes with the poly-A tail of mature mRNA. The\ reverse transcriptase may or may not make it to the 5'\ end of the mRNA, which may or may not be degraded.

\ \

In general, the 3' ESTs mark the end of transcription\ reasonably well, but the 5' ESTs may end at any point\ within the transcript. Some of the newer cap-selected\ libraries are starting to hit transcription start\ reasonably well. Before the cap-selection techniques\ emerged, some projects used random rather than poly-A\ priming in an attempt to get sequence distant from the\ 3' end. These projects were successful at this, but as\ a side effect also deposited sequences from unprocessed\ mRNA and perhaps even genomic sequences into the EST databases.\ (Even outside of the random-primed projects, there is a\ degree of non-mRNA contamination.) Because of this, a\ single unspliced EST should be viewed with considerable\ skepticism. However, because the $organism 3' UTRs are quite\ long, the splicing requirement does eliminate many genuine 3'\ ESTs.

\ \

To generate this track, $organism ESTs from Genbank are aligned \ against the genome using the BLAT program. Note that the maximum intron length\ allowed by BLAT is 500,000 bases, which may eliminate some ESTs with very \ long introns that might otherwise align. When a single \ EST aligns in multiple places, the alignment having the \ highest base identity is found. Only alignments that have \ a base identity level within 1% of the best are kept. \ Alignments must also have at least 93% base identity to be kept.

\ \

Using the Filter

\

The track filter can be used to change the color or include/exclude a subset of \ individual items within a track. This is helpful when many items are shown in the \ track display, especially when only some are relevant to the current task. To use the\ filter:\

    \
  1. Enter a value in one or more of the text boxes to filter the EST display. For\ example, to apply the filter to all ESTs expressed in the liver, type "liver" in the \ tissue box. For a list of permissible filter values, consult the non-positional table in\ the Table Browser that corresponds to the factor on which you wish to filter. For\ example, the non-positional table "tissue" contains all of the types of tissues\ that can be entered into the tissue text box. Wildcards can also be used in the\ filter.\
  2. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only ESTs that match all of the filter criteria will\ be highlighted. If "or" is selected, ESTs that match any 1 of the filter criteria\ will be highlighted.\
  3. Choose the color or display characteristic that should be used to highlight or\ include/exclude the filtered items. If "exclude" is chosen, the browser will not \ display ESTs that match the filter criteria. If "include" is selected, the browser \ will display only those ESTs that match the filter criteria.\
  4. When you have finished configuring the filter, click the Submit button.\

\ \

Credits

\ The $Organism EST track is produced at UCSC from EST sequence data\ submitted to the international public sequence databases by \ scientists worldwide.\ rna 1 tightEst Tight ESTs psl est Tightly Filtered $Organism ESTs Including Unspliced 0 58 0 0 0 127 127 127 1 0 0 rna 1 mgc_mrna MGC mRNAs psl . Mammalian Gene Collection mRNAs 0 59 0 0 0 127 127 127 1 0 0 rna 1 mgcNcbiPicks NCBI MGC Picks psl est NCBI Clone Picks for the Mammalian Gene Collection 0 60 0 0 0 127 127 127 1 1 0

Description

\

\ Mammalian Gene Collection clones identified as CDS complete\ by Lukas Wagner at NCBI. There are 382137 candidates representing 21594\ genes.\

\ rna 1 mgcNcbiSplicedPicks NCBI Spliced Picks psl est NCBI Clone Picks for the Mammalian Gene Collection That Have Been Spliced 0 61 0 0 0 127 127 127 1 1 0

Description

\

\ Mammalian Gene Collection clones identified as CDS complete\ by Lukas Wagner at NCBI that have been spliced by UCSC.\ There are 382137 candidates representing 21594 genes.\

\ rna 1 mgcUcscPicks UCSC MGC Picks psl est UCSC Clone Picks for the Mammalian Gene Collection 0 62 0 0 0 127 127 127 1 1 0

Description

\

UCSC clone picks for the Mammalian Gene Collection.

\ \

Method

\

Initial clustering of MGC ESTs by Jim Kent. MGC ESTs that at least 80% \ overall identity and align within 30 bases of a cluster 5' end. No pick is \ made if there is already an MGC RNA within the cluster.

\ rna 1 xenoMrna Non$organism mRNA psl xeno Non$organism mRNAs from Genbank 1 63 0 0 0 127 127 127 1 0 0

Description

\ This track displays translated blat alignments of\ non-$organism vertebrate mRNA from Genbank. \ \

Method

\ The alignments were passed through a near-best-in-genome filter.\ \

Using the Filter

\

The track filter can be used to color, include, or exclude a subset of individual \ items within a track. This is helpful when many items are shown in the track\ display, especially when only some are relevant to the current task. To use the\ filter:\

    \
  1. Enter a value in one or more of the text boxes to filter the mRNA display. For\ example, to apply the filter to all brain mRNAs, type "brain" in the \ tissue box. For a list of permissible filter values, consult the non-positional table in\ the Table Browser that corresponds to the factor on which you wish to filter. For\ example, the non-positional table "tissue" contains all of the types of tissues\ that can be entered into the tissue text box. Wildcards can also be used in the\ filter.\
  2. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only mRNAs that match all of the filter criteria will\ be highlighted. If "or" is selected, mRNAs that match any 1 of the filter criteria\ will be highlighted.\
  3. Choose the color or display characteristic that will be used to highlight or\ include/exclude the filtered items. If "exclude" is chosen, the browser will not \ display mRNAs that match the filter criteria. If "include" is selected, the browser \ will display only those mRNAs that match the filter criteria.\
  4. Click the Submit button.\

\ rna 1 xenoBestMrna Other Best mRNAs psl xeno Non$organism mRNAs from Genbank Best in Genome Alignments 0 64 0 0 0 127 127 127 1 0 0

Description

\ This track displays translated blat alignments of\ non-$organism vertebrate mRNA from Genbank. \ \

Method

\ The alignments were passed through a near-best-in-genome filter.\ \

Using the Filter

\

The track filter can be used to color, include, or exclude a subset of individual \ items within a track. This is helpful when many items are shown in the track\ display, especially when only some are relevant to the current task. To use the\ filter:\

    \
  1. Enter a value in one or more of the text boxes to filter the mRNA display. For\ example, to apply the filter to all brain mRNAs, type "brain" in the \ tissue box. For a list of permissible filter values, consult the non-positional table in\ the Table Browser that corresponds to the factor on which you wish to filter. For\ example, the non-positional table "tissue" contains all of the types of tissues\ that can be entered into the tissue text box. Wildcards can also be used in the\ filter.\
  2. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only mRNAs that match all of the filter criteria will\ be highlighted. If "or" is selected, mRNAs that match any 1 of the filter criteria\ will be highlighted.\
  3. Choose the color or display characteristic that will be used to highlight or\ include/exclude the filtered items. If "exclude" is chosen, the browser will not \ display mRNAs that match the filter criteria. If "include" is selected, the browser \ will display only those mRNAs that match the filter criteria.\
  4. Click the Submit button.\

\ rna 1 xenoEst Non$organism EST psl xeno Non$organism ESTs from Genbank 0 65 0 0 0 127 127 127 1 0 0 http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$

Description

\ This track displays translated BLAT alignments of\ non-$organism vertebrate ESTs from Genbank. \ \

Method

\ To generate this track, the ESTs are aligned against the genome using the BLAT\ program. The alignments are passed through a piecewise near-best-in-genome\ filter.\ \

Using the Filter

\

The track filter can be used to change the color or include/exclude a subset of \ individual items within a track. This is helpful when many items are shown in the \ track display, especially when only some are relevant to the current task. To use the\ filter:\

    \
  1. Enter a value in one or more of the text boxes to filter the EST display. For\ example, to apply the filter to all ESTs expressed in the liver, type "liver" in the \ tissue box. For a list of permissible filter values, consult the non-positional table in\ the Table Browser that corresponds to the factor on which you wish to filter. For\ example, the non-positional table "tissue" contains all of the types of tissues\ that can be entered into the tissue text box. Wildcards can also be used in the\ filter.\
  2. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only ESTs that match all of the filter criteria will\ be highlighted. If "or" is selected, ESTs that match any 1 of the filter criteria\ will be highlighted.\
  3. Choose the color or display characteristic that should be used to highlight or\ include/exclude the filtered items. If "exclude" is chosen, the browser will not \ display ESTs that match the filter criteria. If "include" is selected, the browser \ will display only those ESTs that match the filter criteria.\
  4. When you have finished configuring the filter, click the Submit button.\

\ \

Credits

\ This track is produced at UCSC from EST sequence data submitted to the\ international public sequence databases by scientists worldwide.\ rna 1 anyCovBed mRNA/EST/pseud bed 3 Blastz Alignments of GenBank mRNA including Pseudogenes and all ESTs 0 66 170 128 128 212 191 191 0 0 0 rna 1 anyMrnaCov mRNA/pseud bed 3 Blastz Alignments of GenBank mRNA including Pseudogenes 0 67 170 128 128 212 191 191 0 0 0 rna 1 tigrGeneIndex TIGR Gene Index genePred Alignment of TIGR Gene Index TCs Against the $Organism Genome 1 68 100 0 0 177 127 127 0 0 0 http://www.tigr.org/tigr-scripts/nhgi_scripts/tc_report.pl?$$

Description

\

This track displays alignments of the TIGR Gene Index (TGI)\ against the $organism genome. The TIGR Gene Index is based\ largely on assemblies of EST sequences in the public databases.\ See \ www.tigr.org for more information about TIGR and the Gene Index.

\

Credits

\

Thanks to Foo Cheung for converting this data into a track\ for the browser.

\ rna 1 uniGene_2 UniGene bed 12 UniGene Alignments and SAGE Info 0 69 0 0 0 127 127 127 1 0 0

Description: Serial Analysis of Gene Expression (SAGE)\ is a quantative measurement gene expression. Data is presented for every cluster contained \ in the browser window and the selected cluster name is highlighted in red. All data is from \ the repository at the SageMap \ project downloaded Jul 26, 2002. Selecting the UniGene cluster name will display SageMap's page for that cluster.\

Brief Methodology: SAGE counts are produced \ by sequencing small "tags" of DNA believed to be associated with a \ gene. These tags are produced by attatching poly-A RNA to oligo-dT \ beads. After synthesis of double stranded cDNA transcripts are \ cleaved by an anchoring enzyme (usually NIaIII). Then small tags are \ produced by ligation with a linker containing a type IIS restriction \ enzyme site and cleavage with the tagging enzyme (usually BsmFI). The \ tags are then concatenated together and sequenced. The frequency of each \ tag is counted and used to infer expression level of transcripts that can \ be matched to that tag. All SAGE data presented here was mapped to UniGene \ transcripts by the SageMap \ project at NCBI .



\ \ rna 1 uniGene UniGene psl . UniGene Alignments and SAGE Info 0 70 0 0 0 127 127 127 0 0 0 rna 1 rnaCluster Gene Bounds bed 12 Gene Boundaries as Defined by RNA and Spliced EST Clusters 0 71 200 0 50 227 127 152 0 0 0

Description

\ This track shows the boundaries of genes and the direction of\ transcription as deduced from clustering spliced ESTs and mRNAs\ against the genome. When there are many spliced variants\ of the same gene, this track shows the variant that\ spans the greatest distance in the genome. \ \

Method

\ ESTs and mRNAs from Genbank are aligned against the genome\ with the BLAT program, and filtered to keep only those alignments\ that have at least 97.5% base identity within the \ aligning blocks. When multiple alignments occur, only the\ alignments with a percentage identity within 0.2% of the\ best alignment are kept. ESTs that align without any\ introns are discarded. Blocks that are less than 130 bases\ and are not next to an intron are discarded. Blocks smaller\ than 10 bases are discarded. The orientations of the \ ESTs and mRNAs are deduced from the GT/AG splice sites\ at the introns, and ESTs and mRNAs with overlapping blocks\ on the same strand are merged into clusters. Only the\ extent and orientation of the clusters are shown here.\ \

Credits

\ This track was generated at UCSC by Jim Kent using data\ submitted to Genbank by scientists worldwide.\ rna 1 genieBounds Clone Bounds bed 9 Clone Boundaries from EST Mate Pairs 0 72 178 34 34 216 144 144 0 0 0

Description & Credits

\ \

These clone bounds are based on EST mate pairs from \ Affymetrix's \ Genie gene finding software. \

\ rna 1 altGraph altGraph psl altGraph 0 73 0 0 0 127 127 127 0 0 0 rna 1 altGraphX altGraphX bed 3 altGraphX 0 74 0 0 0 127 127 127 0 0 0 rna 1 refFullAli DBTSS mRNA psl . RefSeq mRNA extended to the 5' end from DBTSS 0 75 0 0 100 127 127 177 0 0 0 rna 1 cpgIsland CpG Islands bed 4 + CpG Islands (Islands < 300 Bases are Light Green) 0 76 0 100 0 128 228 128 0 0 0

Description

\

\ CpG islands are associated with genes, particularly housekeeping\ genes, in vertebrates. CpG islands are particularly common near\ transcription start sites, and may be associated with promoter\ regions. Normally a C followed immediately by a G (a CpG) is rare in\ vertebrate DNA because the C's in such an arrangement tend to be\ methylated. This methylation helps distinguish the newly synthesized\ DNA strand from the parent strand, which aids in the final stages of\ DNA proofreading after duplication. However, over evolutionary time\ methylated C's tend to turn into T's because of spontaneous\ deamination. The result is that CpG's are relatively rare unless\ there is selective pressure to keep them or a region is not methylated\ for some reason, perhaps having to do with the regulation of gene\ expression. CpG islands are regions where CpG's are present at\ significantly higher levels than is typical for the genome as a whole.\

\ \

Method

\

\ CpG islands were predicted by searching the sequence one base at a\ time, scoring each dinucleotide (+17 for CG and -1 for others) and\ identifying maximally scoring segments. Each segment was then\ evaluated to determine GC content (>=50%), length (>200), and ratio of\ observed proportion of CG dinucleotides to the expected proportion on\ the basis of the GC content of the segment (>0.6). \

\ \

Credits

\

\ This track was generated \ using a\ modification of a program developed by G. Miklem and L. Hillier. \

\ \ regulation 1 transfacHit Transfac Hits bed 6 Transfac Transcription Factor Binding Sites Near Transcription Start 0 77 0 0 0 127 127 127 1 0 0 regulation 1 triangleSelf Golden Triangle bed 6 Golden Triangle Possible Transcription Factor Binding Sites 0 78 0 0 0 127 127 127 1 0 0 regulation 1 triangle Golden Extra bed 6 Golden Triangle Motif Matching Sites Near Transcription Start 0 79 0 0 0 127 127 127 1 0 0 regulation 1 transfac Transfac Hits genePred refPep refMrna Transfac Hits 0 80 12 12 120 133 133 187 0 0 0 regulation 1 transfacRatios Transfac Ratios bed 6 Transfac Likelihood Ratios 0 81 12 12 120 133 133 187 0 0 0 regulation 1 psuReg Known Regulatory bed 4 . Functional Regulatory Elements compiled by Penn State 0 82 30 130 210 142 192 232 0 0 0

Regulatory Elements


\
\ This list of functional regions contains names and coordinates of the regulatory regions relative to the Decmber version of the Human Genome Browser. \
\ Note these regions have not been trimmed to show the smallest possible functional element with maximum activity. They range in size from 300-4000 bp. \
\

\

\ Details on source of Regulatory Region data
\ \

\ \ please direct comments or questions to Laura Elnitski at elnitski@bio.cse.psu.edu\

\ April 16,2002\

\ Data made available by Laura Elnitski, Webb Miller, Ross Hardison, Scott Schwartz, Emmanouil Dermitzakis, Andrew Clark, William Krivan and Wyeth Wasserman\ regulation 1 softPromoter TSSW Promoters bed 5 + TSSW Promoter Predictions 1 83 0 100 0 127 177 127 0 0 0 regulation 1 nci60 NCI60 bed 15 + Microarray Experiments for NCI 60 Cell Lines 1 84 0 0 0 127 127 127 0 0 0 \

Description

\ \

Expression data from "\ Systematic variation in gene expression patterns in human cancer cell\ lines"[pubmed],\ Ross et al. Nat Genet 2000 Mar;24(3):227-35. cDNA microarrays were\ used to explore the variation in expression of approximately 8,000\ unique genes among the 60 cell lines used in the National Cancer\ Institute's screen for anti-cancer drugs. The authors have provided a\ web supplement \ where more data and experimental description can be obtained. cDNA\ probes were placed on the draft human genome using genebank sequences\ referenced by the IMAGE clone ids. \ \

The data are shown in a tabular format in which each column of\ colored boxes represents the variation in transcript levels for a\ given cDNA across all of the array experiments, and each row\ represents the measured transcript levels for all genes in a single\ sample. The variation in transcript levels for each gene is\ represented by a color scale, in which red indicates an increase in\ transcript levels, and green indicates a decrease in transcript\ levels, relative to the reference sample. The saturation of the color\ corresponds to the magnitude of transcript variation. A black color\ indicates an undetectable change in expression, while a gray box\ indicates missing data.\ \

Display Options

\ This track has filter options to customize tissue types presented and\ the color of the display.\ \

Cell Line: This option is only valid when the track is \ displayed in full. It determines how the experiments are displayed. The\ options are:\

    \
  • Tissue Averages: Displays the average of the log ratio scores of all cell lines \ from the different tissue types.
  • \
  • All Cell Lines: Displays the log ratio score for all cell line experiments.\
  • \
  • Specific Tissues: Displays the log ratio score for all cell lines belonging\ to a given tissue type.
  • \
  • Duplicates and Unknown: Same as for a Specific Tissue except these experiments \ were duplicates of others or the tissue type was not specified.
  • \
\ Color Scheme: \ Data are presented using two color false display. By default\ the Brown/Botstein colors of red -> positive log ratio, green -> negative log ratio are used.\ However, blue can be substituted for green for those who are color blind.\ \

Details Page

\ On the details page, the probes presented\ correspond to those contained in the window range displayed on the Genome\ Browser. The exon probe and experiment selected are highlighted in\ blue.\ regulation 1 affy GNF bed 15 + GNF Gene Expression Atlas using Affymetrix GeneChips 0 85 0 0 0 127 127 127 0 0 0

GNF Gene Expression Atlas Experiments using Affymetrix GeneChips

\ \

A series of experiments using different normal tissues using Affymetrix GeneChips performed\ by GNF (The Genomics Institute of the\ Novartis Research Foundation) . Alignments displayed on the\ track correspond the the target sequences used by Affymetrix from\ which to choose probes. Color denotes denotes signal intensity on a\ log base 2 scale with darker colors corresponding to lower signal and\ lighter colors corresponding to higher signal. Please note that this\ track is under construction and will not be official until the GNF\ publishes their results.\ \

Track options include the ability to group results by chip type,\ tissue average, and individual chip identification numbers.\ regulation 1 affyRatio GNF Ratio bed 15 + GNF Gene Expression Atlas Ratios using Affymetrix GeneChips 0 86 0 0 0 127 127 127 0 0 0

Description

\

This track shows expression data from GNF (The Genomics Institute of the Novartis Research Foundation)\ using Affymetrix GeneChips.

\ \

Methods

\

For detailed information about the experiments, see Su et al., \ "Large-scale analysis of the human and mouse transcriptomes.", \ PNAS, Mar 19, 2002. Alignments displayed on the track\ correspond to the target sequences used by Affymetrix from which to\ choose probes.

\

In dense mode, the track color denotes the average signal over all\ experiments on a log base 2 scale. Lighter colors correspond to lower signals \ and darker colors correspond to higher signals. In full\ mode, the color of each item represents the log base 2 ratio of the signal of\ that particular experiment to the median signal of all experiments for that probe.\

More information about individual probes and probe sets is available at\ Affymetrix's netaffx.com website. \ \

Using the Filter

\

The track filter can be used to change the display mode, group the displayed\ results, and change the display colors. \

    \
  • To change the display mode for the track, select the desired display setting\ from the Display Mode dropdown list, then click the Submit button.\
  • To group the displayed results by chip type, tissue averages, or individual \ chip identification numbers, select the corresponding item from the Experiment\ Display dropdown list, then click the Submit button. \
  • To change the display colors of the track, click the button in front of \ the desired color scheme, then click the Submit button.

\ \

Credits

\

Thanks to GNF for providing this data. Please note that this track is under construction and will not be\ official until GNF publishes their results.

\ regulation 1 cghNci60 CGH NCI60 bed 15 + Comparative Genomic Hybridization Experiments for NCI 60 Cell Lines 0 87 0 0 0 127 127 127 0 0 0 \

Description

\ \

The data are shown in a tabular format in which each column of\ colored boxes represents the variation in genomic DNA levels from a \ normal cell line for a\ given clone across all of the NCI60 cell lines, and each row\ represents the measured genomic DNA levels for clones in a single\ sample. The variation in genomic DNA levels for each clone is\ represented by a color scale, in which green indicates an increase in\ genomic DNA levels, and red indicates a decrease in genomic DNA\ levels, relative to the reference sample. The saturation of the color\ corresponds to the magnitude of transcript variation. A black color\ indicates an undetectable change in genomic DNA, while a gray box\ indicates missing data.\ \

Display Options

\ This track has options to customize tissue types presented and\ the color of the display.\ \
Cell Line: This option is only valid when the track is \ displayed in full. It determines how the experiments are displayed. The\ options are:\
    \
  • Tissue Averages: Displays the average of the log ratio scores of all cell lines \ from the different tissue types.
  • \
  • All Cell Lines: Displays the log ratio score for all cell line experiments.\
  • \
  • Specific Tissues: Displays the log ratio score for all cell lines belonging\ to a given tissue type.
  • \
\ Color Scheme: \ Data are presented using two color false display. By default\ the colors of green -> positive log ratio, red -> negative log ratio are\ used.\ However, blue can be substituted for green for those who are color blind.\ \

Details Page

\ On the details page the probes presented\ correspond to those contained in window range seen on the Genome\ Browser, the exon probe and experiment selected are highlighted in\ blue.\ regulation 1 rosetta Rosetta bed 15 + Rosetta Experimental Confirmation of Chr22 Exons 1 88 0 0 0 127 127 127 0 0 1 chr22,

Description

\

Expression data from Rosetta Inpharmatics.\ See the paper "Experimental Annotation of the Human Genome Using Microarray Technology"\ Nature Feb. 2001, vol 409 pp 922-7 for more\ information. Briefly, Rosetta created DNA probes for each exon as\ described by the Sanger center for the October 2000 draft of the\ genome and used them to explore expression leves over 69 different\ experiments. As in the original paper exons are labeled according to\ contig name, relative position in the contig, and whether they were\ predicted (pe) or confirmed (true->te) exons at the time of\ publication. For example, AC000097_256_te is the 256th exon on\ AC000097 predicted by Genescan which was confirmed\ independently. Hybridization names refer to the sources of the two\ mRNA populations used for the experiment.\ Please note: in the browser window the hybridization names\ are too long to fit and have been abbreviated. Also, the ratios\ were inverted as of Feb 12, 2002 to conform with standard microarray\ conventions of having the experimental sample in the red (cy5) channel\ and the reference sample in the green (cy3) channel.\ \

Display Options

\ The track can be configured with a few different options:\ \
Reference Sample: This option is only valid when the track is displayed in\ full. It determines how the 69 different experiments are displayed. The\ options are:\
    \
  • All Experiments: Display the data for each experiment one\ row per experiment.
  • \
  • Common Reference and Other: Displays a summary value for all the\ experiments done with a common pooled reference sample, and another summary value for \ the rest of the experiments.
  • \
  • Common Pool: Only displays experiments which were performed using a \ common pooled reference sample.
  • \
  • Other: Only displays experiments which were not performed using a common\ pooled reference sample.
  • \
\ \ Exons Shown: Probes on the microarrays correspond to gene\ predictions on chromosome 22, some of which were confirmed by known\ genes, others are predictions. This option determines whether data is\ shown for probes corresponding to confirmed, predicted, or all exons\ are shown.\ \
Color Scheme: Data are presented using two color false\ display. By default the Brown/Botstein colors of red -> positive log\ ratio, green -> negative log ratio are used. However, blue can be\ substituted for green for those who are color blind. Gray values\ indicate missing data. Please note that due to technical limitations\ the details page will have many more color shades possible than those used\ on the browser image and thus may not match exactly.\ \

Details Page

\ On the details page the probes presented correspond to those contained\ in window range seen on the Genome Browser, the exon probe selected is highlighted\ in blue. The detail display table is actually an average of many data\ points. It is possible to see the full data for each experiment\ graphically by selecting the check-boxes for the experiments of interest\ and clicking the submit value button.\ regulation 1 affyTranscriptome Transcriptome sample Affymetrix Experimentally Derived Transcriptome 0 89 100 50 0 0 0 255 0 0 2 chr22,chr21, \

Description

\ \

Transcriptome data for chromosomes 21 and 22 from Affymetrix, as described in \ "Large-Scale Transcriptional Activity in Chromosomes 21 and 22",\ Kapranov, P., Cawley, S. E., Drenkow, J., Bekiranov, S, Strausberg,\ R. L., Fodor, S.P.A. and Gingeras, T.R.. In general, the data presented\ is the perfect match - mis-match value. Different experiments were\ normalized by setting the average value to be the same for each\ chip. Replicates for different cell types were averaged together to\ produce the data seen in "full" mode for each cell type. In dense\ mode, or at the top of the track in full mode, "Transcriptome" displays\ the maximum value over all experiments for that probe, the idea being\ to paint as many transcribed regions as possible. \ \

To present a more\ interpretable display when zoomed out, averages have been precalculated\ over the chromosome at two different resolutions in addition to the\ raw data. For example, when zoomed out, there may appear to be a peak at\ the center of a gene rather than a signal at every exon. Zooming in\ will reveal the "raw" data for that region.\ \

Questions/Comments? Email sugnet@cse.ucsc.edu.\ regulation 0 exoFish Exofish Ecores bed 5 . Exofish Tetraodon/Human Evolutionarily Conserved Regions 1 111 0 60 120 200 220 255 1 0 0 http://www.genoscope.cns.fr/proxy/cgi-bin/exofish.cgi?Id=$$&Mode=Id

Description

\

The Exofish track shows regions of homology with the \ pufferfish Tetraodon nigroviridis. \ exofish@genoscope.cns.fr. The following paper describes \ Exofish: 'Estimate of human gene number provided by \ genome-wide analysis using Tetraodon nigroviridis \ DNA sequence' Nature Genetics volume 25 page 235, \ June 2000.

\

Credits

\ This information \ was provided by Olivier Jaillon and Hugues Roest Crollius at Genoscope. \ For further information and other Exofish tools please visit the \ \ Genoscope Exofish web site, or \ email \ exofish@genoscope.cns.fr. \ compGeno 1 blatFish Tetraodon Blat psl xeno Tetraodon nigriviridis Translated Blat Alignments 1 112 0 60 120 200 220 255 1 0 0

Description

\

This track displays translated alignments of 728 million bases of Tetraodon nigroviridis \ whole genome shotgun reads vs. the draft $organism genome. Areas painted by\ this track are quite likely to be coding regions.

\

Methods

\

The alignments were done \ with BLAT in translated protein mode requiring 2 nearby 4-mer matches\ to trigger a detailed alignment. The human\ genome was masked with RepeatMasker and Tandem Repeat Finder before \ running BLAT.

\

Credits

\

Many thanks to Genoscope for \ providing the Tetraodon sequence.

\ compGeno 1 blatFugu Fugu Blat psl xeno Takifugu rubripes Translated Blat Alignments 1 113 0 60 120 200 220 255 1 0 0

Description

\ \

\ The second draft assembly of the Fugu genome consists of 12,403 contigs,\ or scaffolds. The individual contigs sizes range from 2-650Kbp.\ The draft sequence of Fugu Genome available covers 320 Mbases of DNA\ sequence. \ We estimate that this covers approximately 90% of the non-repetitive fraction\ of the genome (which has a total size, including repeats, of around\ 365Mbases).\

\ \

The 3.0 draft from http://genome.jgi-psf.org/fugu6/fugu6.info.html was used in the UCSC fugu BLAT alignments.

\ \

Methods

\

The alignments were done \ with BLAT in translated protein mode requiring 2 nearby 4-mer matches\ to trigger a detailed alignment. The human\ genome was masked with RepeatMasker and Tandem Repeat Finder before \ running BLAT.

\

Credits

\ This data has been provided freely by the Fugu Genome Consortium for use in this publication/correspondence only. \

Many thanks to Fugu Genomics Project and the Fugu Genome Project for providing the Fugu sequence.

\ compGeno 1 blatTetra Tetra Blat psl xeno Tetraodon nigriviridis Translated Blat Alignments 1 114 0 60 120 200 220 255 1 0 0

Description

\

This track displays translated alignments of 728 million bases of Tetraodon nigroviridis \ whole genome shotgun reads vs. the draft $organism genome. Areas painted by\ this track are quite likely to be coding regions.

\

Methods

\

The alignments were done \ with BLAT in translated protein mode requiring 2 nearby 4-mer matches\ to trigger a detailed alignment. The human\ genome was masked with RepeatMasker and Tandem Repeat Finder before \ running BLAT.

\

Credits

\

Many thanks to Genoscope for \ providing the Tetraodon sequence.

\ compGeno 1 tet_waba Tetraodon Tetraodon nigroviridis Homologies 1 115 50 100 200 85 170 225 0 0 0 compGeno 0 blatChicken Chicken Blat psl xeno Chicken Translated Blat Alignments 0 116 100 50 0 255 240 200 1 0 0 compGeno 1 musHumL Human Cons sample 0 8 Mouse/Human Evolutionary Conservation Score 2 119 100 50 0 175 150 128 0 0 0

Description

\

\ This track displays the conservation between the mouse and human genomes for \ 50bp windows in the mouse genome that have at least 15 bp aligned to\ human. The score for a window reflects the probability that the\ level of observed conservation in that 50bp region would occur by\ chance under neutral evolution. It is given on a logarithmic scale,\ and thus it is called the "L-score". An L-score of 1 means there is a\ 1/10 probability that the observed conservation level would occur by\ chance, an L-score of 2 means a 1/100 probability, an L-score of 3\ means a 1/1000 probability, etc. The L-scores display as\ "mountain ranges". Clicking on a mountain range, a detail page is\ displayed from which you can access the base level alignments, both\ for the whole region and for the individual 50bp windows.\ \

\ \

Methods

\

\ Genome-wide alignments between mouse and human were produced by\ Blastz. A set of 50bp windows in the mouse genome were determined\ by scanning the sequence, sliding 5 bases at a time, and only those\ windows with at least 15 aligned bases were kept. For each window,\ a conservation score defined by\

\

\ S = sqrt(n/m(1-m))(p-m)\
\
\ was calculated, where n is the number of aligning bases in the\ window, p is the percent identity between mouse and human for these\ aligning bases, and m is the average percent identity for aligned\ neutrally evolving bases in a larger region surrounding the 50bp\ window being scored. Neutral bases were taken from ancestral repeat\ sequences, which are relics of transposons that were inserted before\ the human-mouse split. To transform S into an L-score, the empirical\ cumulative distribution function CDF(S) = P(x < S)\ is computed from the scores of all windows genome-wide, and\ the L-score is defined as\

\
\ L = -log_10(1 - CDF(S)).\
\
\
\ The L-score\ provides a frequentist confidence assessment. A Bayesian\ calculation of the probability that a window is under\ selection can also be made using a mixture decomposition of\ the empirical density of the scores for all windows\ genome-wide into a neutral and a selected component. Details\ are given in a manuscript in preparation. The results are\ summarized in the table below.\

\
\
\
L-score       Frequentist probability       Bayesian probability\
              of this L-score or greater    that window with this\
              given neutral evolution       L-score is under\
                                            selection\
\
------------------------------------------------------------------\
\
   1                0.1                          0.32 \
  2                0.01                         0.75\
  3                0.001                        0.94\
  4                0.0001                       0.97\
  5                0.00001                      0.98\
  6                0.000001                     0.99\
    7                0.0000001                    >0.99 \
   8                0.00000001                   >0.99\
\
\
\

\ \ \

\

Credits

\

\ \ Thanks to Webb Miller and Scott Schwartz for creating the Blastz\ alignments, Jim Kent for post-processing them, and \ Mark Diekhans for scoring the windows and selecting out the ancestral repeats. \ Krishna Roskin created S-scores for these windows. Ryan Weber computed the CDF \ for these S-scores, \ and created the remaining track display functions. Thanks to the Mouse\ Genome Sequencing Consortium for providing the mouse sequence data.\ \

\ \ compGeno 0 humanChain Human Chain chain hg13 Chained Blastz Mouse/Human Alignments 0 132 100 50 0 255 240 200 1 0 0 compGeno 1 mouseChain Mouse Chain chain mm2 Chained Blastz Mouse/Human Alignments 0 132 100 50 0 255 240 200 1 0 0 compGeno 1 mouseChain2 Mouse Chain2 chain mm2 Chained Blastz Mouse/Human Alignments2 0 133 100 50 0 255 240 200 1 0 0 compGeno 1 humanNet Human Net netAlign hg13 humanChain Mouse/Human Alignment Net 0 134 0 0 0 127 127 127 1 0 0 compGeno 0 mouseNet Mouse Net netAlign mm2 mouseChain Mouse/Human Alignment Net 0 134 0 0 0 127 127 127 1 0 0 compGeno 0 humanNet2 Human Net2 netAlign hg13 humanChain2 Mouse/Human Alignment Net2 0 135 0 0 0 127 127 127 1 0 0 compGeno 0 mouseNet2 Mouse Net2 netAlign mm2 mouseChain2 Mouse/Human Alignment Net2 0 135 0 0 0 127 127 127 1 0 0 compGeno 0 mouseSynNet Syntenic Net netAlign mm2 Mouse/Human Syntenic Alignment Net 0 136 0 0 0 127 127 127 1 0 0 compGeno 0 mouseSyn NCBI Synteny bed 4+ Corresponding Chromosome in Mouse (NCBI) 0 137 120 70 30 187 162 142 0 0 0

Description

\

This track shows syntenous (corresponding) regions between human and mouse\ chromosomes.

\

Method

\

This track was created by looking for homology to known mouse genes in the draft \ assembly. The mouse data is provided at the chromosome level (not cytoband).

\

Credits

\

The data for this track was kindly provided by Deanna Church at NCBI. Refer to the \ NCBI Homology site for more\ details.

\ \

Credits

\

This track is produced from mouse sequence data provided by the \ Mouse Genome Sequencing Consortium. \ compGeno 1 syntenyRat Rat Synteny bed 4 . Human/Rat Synteny using blastz single coverage with 100,000 base window 0 138 0 100 0 255 240 200 0 0 0 compGeno 1 synteny100000 UCSC Synteny bed 4 . Human/Mouse Synteny using blastz single coverage with 100,000 base window 0 138 0 100 0 255 240 200 0 0 0

Description

\

\ This track shows syntenous (corresponding) regions between human and mouse chromosomes. \

Methods

\

\ We passed a 100k non-overlapping window over the genome and - using the blastz best in mouse \ genome alignments - looked for high-scoring regions with at least 40% of the bases aligning \ with the same region in mouse. 100k segments were joined together if they agreed in direction and\ were within 500kb of each other in the human genome and within 4mb of each other in the mouse. \ Gaps were joined between syntenic anchors if the bases between two flanking regions agreed with \ synteny (direction and mouse location). Finally, we extended the syntenic block to include those \ areas.

\

Credits

\

\ Contact Robert Baertsch at UCSC for more information about this track.\ Thanks to the Mouse Genome Sequencing Consortium for providing the mouse sequence data. \ compGeno 1 mouseSynWhd Mouse Synteny bed 6+ Whitehead Corresponding Chromosome in Mouse (300k window) 0 139 120 70 30 187 162 142 0 0 0

Description

\

\ This track shows orthologous (syntenic) regions between mouse and human\ chromosomes.\

\ See \ \ http://www-genome.wi.mit.edu/mouse/synteny/index.html \ for genomic dotplots and additional information or the following site for\ an alternative synteny map based on orthologous genes:\ \ http://www.ncbi.nlm.nih.gov/Homology/ .\ \

Credits

\

\ The data for this track is kindly provided by \ Michael Kamal \ at the \ \ Whitehead Institute. \ Mouse sequence data is provided by the \ Mouse Genome Sequencing Consortium. \ \ compGeno 1 syntenySanger Sanger Synteny bed 4+ Sanger Corresponding Chromosome in Mouse (100k window) 0 140 120 70 30 187 162 142 0 0 0 compGeno 1 blatChimpWashu Chimp Blat - WashU psl xeno Chimp Blat Alignments - WashU 0 141 100 50 0 255 240 200 1 0 0 compGeno 1 chimp Chimp sample Chimp Sample Track 0 142 100 50 0 0 0 255 0 0 1 chr7, compGeno 0 lineageMutations LineageMutations sample Lineage Specific Mutations 0 143 0 0 0 0 160 0 0 0 0 varRep 0 snpNih Overlap SNPs bed 4 . Single Nucleotide Polymorphisms (SNPs) from Clone Overlaps 1 144 0 0 0 127 127 127 0 0 0

Description

\

This track shows locations of Single Nucleotide Polymorphisms\ detected primarily by looking at overlaps between clones that\ cover the same region of the genome.

\

Credits

\

Thanks to the SNP\ Consortium and NIH's dbSNP\ for providing this data.

\ \

========================================================================

\ \

Filtering of dbSNP data for UCSC SNP tracks.

\ \

The SNPs in this track include all of the polymorphisms that can be\ mapped against the current assembly. This includes known point\ mutations (Single Nucleotide Polymorphisms), insertions, deletions,\ and segmental mutations from the current build of dbSNP. There are\ three major cases that are not mapped and/or annotated:

\ \

a. Submissions that are completely masked as repetitive\ elements. These are dropped from any further computations. This set of\ refSNPs are dumped in chromosome "rs_chMasked" on the dbSNP ftp site.

\ \

b. Submissions that are defined in a cDNA context with extensive\ splicing. These SNPs are typically annotated on refseq mRNAs through a\ separate annotation process. Effort is being made to reverse map these\ variations back to contig coordinates, but that has not been\ implemented. For now, you can find this set of variations in\ "rs_chNotOn" on the dbSNP ftp site.

\ \

c. Submissions with excessive hits to the genome. Variations with\ 3+ hits to the genome are not included in the tracks, but are available in\ "rs_chMulti" on the dbSNP ftp site.

\ \

The heuristics for the non-SNP variations (i.e. named elements and\ STRs) are quite conservative so some of these are probably lost. This\ approach was chosen to avoid false annotation of variation in\ inappropriate locations.

\ \

The dbSNP ftp site can be found at ftp://ftp.ncbi.nih.gov/snp/.

\ \

========================================================================

\ varRep 1 snpTsc Random SNPs bed 4 . Single Nucleotide Polymorphisms (SNPs) from Random Reads 1 145 0 0 0 127 127 127 0 0 0

Description

\ \

This track shows locations of Single Nucleotide Polymorphisms\ detected by aligning reads from random genomic clones from\ a diverse pool of human DNA against the genome.

\ \

Credits

\ \

Thanks to the SNP\ Consortium and NIH for providing this data.

\ \

========================================================================

\ \

Filtering of dbSNP data for UCSC SNP tracks.

\ \

The SNPs in this track include all of the polymorphisms that can be\ mapped against the current assembly. This includes known point\ mutations (Single Nucleotide Polymorphisms), insertions, deletions,\ and segmental mutations from the current build of dbSNP. There are\ three major cases that are not mapped and/or annotated:

\ \

a. Submissions that are completely masked as repetitive\ elements. These are dropped from any further computations. This set of\ refSNPs are dumped in chromosome "rs_chMasked" on the dbSNP ftp site.

\ \

b. Submissions that are defined in a cDNA context with extensive\ splicing. These SNPs are typically annotated on refseq mRNAs through a\ separate annotation process. Effort is being made to reverse map these\ variations back to contig coordinates, but that has not been\ implemented. For now, you can find this set of variations in\ "rs_chNotOn" on the dbSNP ftp site.

\ \

c. Submissions with excessive hits to the genome. Variations with\ 3+ hits to the genome are not included in the tracks, but are available in\ "rs_chMulti" on the dbSNP ftp site.

\ \

The heuristics for the non-SNP variations (i.e. named elements and\ STRs) are quite conservative so some of these are probably lost. This\ approach was chosen to avoid false annotation of variation in\ inappropriate locations.

\ \

The dbSNP ftp site can be found at ftp://ftp.ncbi.nih.gov/snp/.

\ \

========================================================================

\ varRep 1 perlegen Haplotype Blocks bed 12 Perlegen Common High-Resolution Haplotype Blocks. 1 146 0 0 0 127 127 127 1 0 1 chr21, http://www.perlegen.com/haplotype/blk/$$.html

Track Description:

\ \ Haplotype blocks derived from common SNPs on Chromosome 21 by\ Perlegen Sciences as described\ in "Common High-Resolution Haplotypes." Patil,\ N. et. al. Science 294:1719-1723 (2001) \ [Science]. The location of each haplotype block is represented by\ a blue horizontal line with tall vertical blue bars at the first and\ last SNPs of the block. Blocks are displayed as starting at the first\ SNP and ending at the last SNP of the block, this is slightly\ different from the Perlegen web site where blocks are stretched until\ they abut each other. The shade of the blue indicates the minimum\ number of SNP's required to discriminate between haplotype patterns\ that account for at least 80% of genotyped chromosomes, darker colors\ indicate fewer SNPs are necessary. Individual SNPs are denoted by\ smaller black vertical bars. At multi-megabase resolution in dense\ mode clusters of tall blue bars may indicate hotspots for\ recombination. More information is available for the individual block\ selected at the "Outside Link" above and general information on the\ blocks is available at Perlegen's main haplotype map page\ http://www.perlegen.com/haplotype/\ \ \ \ varRep 1 rmsk RepeatMasker rmsk Repeating Elements by RepeatMasker 1 147 0 0 0 127 127 127 1 0 0 This track was created by Arian Smit's RepeatMasker program which uses the RepBase library of repeats from the Genetic \ Information Research Institute. RepBase is described in \ J. Jurka, RepBase Update, Trends in Genetics 16:418-420, 2000.\ varRep 0 simpleRepeat Simple Repeats bed 4 + Simple Tandem Repeats by TRF 0 148 0 0 0 127 127 127 0 0 0

Description

\ This track displays simple tandem repeats (possibly imperfect) located\ by Tandem Repeats\ Finder, which is specialized to this purpose. These repeats can\ occur within coding regions of genes and may be quite\ polymorphic. Repeat expansions are sometimes associated with specific\ diseases.\ \

Methods

\ For more information about the Tandem Repeats Finder, see G. Benson, \ Tandem repeats finder: a program to analyze DNA sequences, Nucleic Acids \ Research, 1999, 27(2) 573-580.\ \

Credits

\ Tandem Repeats Finder was written by Dr. Gary Benson. \ \ varRep 1 gpcr Gpcr genePred Gpcr from softberry and rachel's hmm 1 149 0 0 0 127 127 127 0 0 0 x 1 gpcrKnown Gpcr Known genePred Gpcr from gpcrdb and genewise 1 150 0 0 0 127 127 127 0 0 0 x 1 loweProbes Lowe's Probes bed 6 Candidate oligos for Stanford microarray 0 151 0 0 0 127 127 127 0 0 1 chr22, \

Candidate Oligos for every Stanford Oligo Chip track

\ \ Oligos were chosen for every Sanger22 annotation on chr22 as\ well as about 2000 other genes. Two oligos were chosen with\ a 3' bias, two with a 5' bias, and two with no bias. For this\ purpose exons are defined to include 3' and 5' UTRs.\ \

The strategy

\ \ These oligo selections are based on the following ideas:\
    \
  • Oligos should have minimum secondary structure as\ they must be available for hybridization.
  • \
  • Oligos should be unique in genome if possible. No\ repeats, should not Blat or Blast other places in genome.
  • \
  • If using oligo-dT for RT-Priming oligos should be in 3' end\ of gene transcript (including UTR).
  • \
  • Oligos should have a uniform hybridization temperature if\ possible. All oligos must be hybridized at same temperature,\ want to minimize cross hybe yet maximize signal.\
\ \ Currently we don't have data to identify which parameters\ are more important than others. Also, some of these scores\ are overlapping (i.e. if tm is limited then high secondary\ structure is less likely). See below for histograms of these\ criteria.\ \
\ \

The Details:

\ \

The Algorithm

\
    \
  • Step through each exon at a step size proportional to the\ size of the exon examining possible oligos, excluding areas that\ are RepeatMasked.
  • \
  • Score each oligos for: Tm difference, distance from 3' end,\ secondary structure, and an Affymetrix heuristic.
  • \
  • Look through candidate probes remembering the maximum\ score for each score.
  • \
  • Each score is then normalized by dividing by the maximum\ and then the normalized scores are combined as an average and oligos\ are sorted to find the best overall score.
  • \
  • Oligos with the best combined normalized scores are BLATed \ until one is found that has a BLAT score below a given \ threshold.
  • \
  • As oligos are chosen, candidate oligos that overlap those\ already chosen are discarded.
  • \
  • If no scores pass the BLAT score or not enough oligos have been\ chosen just pick oligos that have the best combined score.
  • \
\ \

About the scores:

\
    \
  • Tm: Formulas for calculating Tm taken from: "A unified\ view of polymer, dumbbell, and oligonucleotide DNA\ nearest-neighbor thermodynamics" John SantaLucia, Jr. PNAS, Vol\ 95, pp 1460-1465 February 1998.
  • A web version called \ Hyther exists.\
  • Secondary Structure: Calculates the Gibbs free energy of the\ best secondary structure using libraries from the RNAstructure program.
  • \
  • Affy Heuristic: 1 if oligo passes heuristics derived from that published by Affymetrix \ "Nature Biotechnology" vol. 14, Dec, '96) are satisfied, 0 otherwise. The heuristic\ is as follows:\
    \
       no more than 9 A's in window of 20 \
       no more than 9 T's in window of 20\
       no more than 8 C's in window of 20\
       no more than 8 G's in window of 20\
      \
       no more than 6 A's in window of  8\
       no more than 6 T's in window of  8\
       no more than 5 C's in window of  8 \
       no more than 5 G's in window of  8\
    
    \
  • \
  • 3' Dist: Distance from end of oligo to 3' end of target\ sequence.
  • \
  • Blat Score: BLAT score of second most homologous region\ in the genome. If no inserts this is approximately the number of\ base pairs that match.
  • \
\ \ \

Histograms of Scores

\ Histograms are from the Stanford picked gene set.\ \ \ \ \ \ \ \ \ \

\ Secondary structure measured in Gibb's Free energy, higher scores are better.

\ Blat (similar to blast) histogram, lower scores are better.

\ Melting temperatures, scores over 100C do happen in algorithm.

\ Percentage GC, not used in algorithm but presented anyway.
\ \

Please note that all coordinates are relative to the '+' strand\ while all oligo sequences are 5'->3'. This means that all sequences\ displayed are part of the sense strand. So if the oligo is represented\ in the database as being on the '-' strand and starts at 1 and ends at\ 5 of 'atgcatgc' the '+' sequence of the probe would be 'tgcat' but\ that is 3'->5' on the '-' strand so the sequence in the sequence would\ be the reverse complement 'atgct'. \ x 1 exoMouse Exonerate Mouse bed 6 + Mouse/Human Evolutionarily Conserved Regions (Exonerate) 0 152 100 50 0 255 240 200 1 0 0

The Exonerate mouse shows regions of homology with the\ mouse based on Exonerate alignments of mouse random reads\ with the human genome. The data for this track was kindly provided by\ Guy Slater, Michele Clamp, and Ewan Birney at\ Ensembl.

\ x 1 phMouse PH Mouse Ecores bed 6 Pattern Hunter Mouse/Human Evolutionarily Conserved Regions 1 153 100 50 0 255 240 200 1 0 0

PATTERNHUNTER is a highly sensitive and efficient program for\ finding homologies within one, or between two DNA sequences.\ On very long sequences it runs faster than MegaBlast while\ being more sensitive than Blastn (at its default settings).

\

This track was produced by having PATTERNHUNTER index each human chromosome \ 22 and processing all mouse reads in turn, looking\ for double hits of weight 11 (using size 18 words with 7 don't cares in the\ pattern 111010011001010111) to extend from and form gapped alignments.

\

More information about PATTERNHUNTER is available at \ Bioinformatics\ Solutions Inc. See also PatternHunter: Faster and More Sensitive Homology Search; M. Li, B. Ma, and J. Tromp; Bioinformatics in press.

\ \ x 1 fiberMouse Fiberglass Mouse bed 4 . Mouse/Human Evolutionarily Conserved Regions (Fiberglass) 0 154 100 50 0 177 152 127 0 0 0 x 1 mouseOrtho Mouse Ortholog bed 5+ Mouse Orthology using Fgenesh++ Gene Predictions (top 4 reciprocal best) 0 156 0 100 0 255 240 200 1 0 0 x 1 mouseOrthoSeed Tight Ortholog bed 5+ Tight Mouse Orthology using Fgenesh++ Gene Predictions (only reciprocal best) 0 158 0 100 0 255 240 200 1 0 0 x 1 twinscanMgc TwinScan MGC psl . TwinScan MGC candidates 0 159 0 0 0 127 127 127 0 0 0

Description

\

\ TwinScan MGC candidates.\

\ x 1 mgcUcscEst UCSC MGC ESTs psl . UCSC MGC Candidates 0 160 0 0 0 127 127 127 0 0 0 x 1 mouseRefHCons HighCon MouseRef bed 5 + Highly Conserved Reference Mouse Alignments 1 160 100 50 0 255 240 200 1 0 1 chr22, x 1 mouseRefMDup ModDup MouseRef bed 5 + Moderately Duplicated Reference Mouse Alignments 1 161 100 50 0 255 240 200 1 0 1 chr22, x 1 phMouseAli PH Mouse Align bed 5 + Pattern Hunter Human/Mouse Alignments 0 162 100 50 0 255 240 200 1 0 1 chr22,

This track displays the mouse reference alignment produced by\ Pattern Hunter.\ x 1 ancientR Ancient Repeats bed 12 Human/Mouse Ancient Repeats 0 163 0 0 0 127 127 127 1 0 0

Display

\

This track displays alignments of the current mouse assembly (phusion.3)\ against regions of the human genome contained in an ancient copies of\ transposable elements. In this case "ancient" means that RepeatMasker's\ annotation indicates that the copy was fixed as an interspersed repeat in\ a common ancestor of human and mouse. These regions are of interest\ because they, more likely then any other region, have not been under\ functional constraint.\ Each block in the alignment is displayed as a colored block on the track\ with a line connecting all the blocks. The color of each alignment\ indicates the percent identity of aligned residues over all blocks of the\ alignment. 50% identity and below is lightly colored and the color gets\ linearly darker as the percent identity approaches 100%.\ In the alignments, lower case letters indicate that RepeatMasker annotated\ them as an interspersed repeat. Because of the high substitution rate in\ the mouse lineage, the element often only was recognized in the human\ genome. The original alignments often are much longer, but only the region\ witin the repeat is displayed.\ \

Methods

\

The sequences were aligned with blastz (discontiguous exact seeds,\ ungapped extension, local alignments via dynamic programming) and\ postprocessed for single coverage.\ \

Data

\

Human sequence from:

\ \ http://genome-test.cse.ucsc.edu/gs.8/oo.33/chromFa.zip\

Mouse sequence from phusion.3:

\ \ ftp://ftp.ncbi.nlm.nih.gov/pub/TraceDB/mus_musculus/ClipReads/Assemblies/Sanger_Oct15/\

Repeats from:

\ \ http://genome.ucsc.edu/goldenPath/06aug2001/database/
\ (chrN_rmsk.txt.gz for chromosome N)\
\

Credits

\ Alignments contributed by Scott Schwartz. See \ http://bio.cse.psu.edu/genome/hummus/2001-12-16/aar/README.\ x 1