gold Scaffolds bed 3 + Assembly Scaffolds (Supercontigs) 0 9 150 100 30 230 170 40 0 0 0

Description

\

This track shows the Zebrafish Zv3 (Nov. 2003) assembly \ provided by the \ The Wellcome Trust Sanger Institute. \ The assembly has a sequence coverage of about 5.7X and contains 58,339 scaffolds \ (supercontigs) totaling 1.5 billion base pairs.\

\

In dense mode, this track depicts the path through the draft and\ finished clones (aka the golden path) used to create the assembled sequence.\ Clone boundaries are distinguished by the use of alternating gold and brown\ coloration. Where gaps\ exist in the path, spaces are shown between the gold and brown\ blocks. If the relative order and orientation of the contigs\ between the two blocks is known, a line is drawn to bridge the\ blocks.

\

\ The Genome Browser depicts the zebrafish genome as 25 chromosomes consisting of\ whole genome shotgun (WGS) supercontigs that were mapped to a fingerprinted\ contig (FPC) and were from a known chromosome. There are also 3 unordered virtual \ chromosomes: \

\ The virtual chromosomes contain 1000 bp \ scaffold gaps that are shown in the Gap track annotation.\

\ All components within this track are of fragment type "W"\ (WGS contig) except for those on chrFinished and chrM, which are of \ type "F" (Finished).\

\

Methods

\

\ This assembly was constructed using the assembler, Phusion, to cluster reads. \ Phrap was then used for cluster assembly and consensus generation. \ For the clone-based mapping and finishing, clones from different libraries were\ fingerprinted by digestion with the HindIII restriction enzyme. From the \ information produced in this way, overlapping clones were linked into \ fingerprinted contigs (FPCs). Next, clones from a tiling path through the FPC \ contigs were selected for high quality sequencing. The resulting \ sequence was submitted to EMBL/GenBank. \

\ The supercontigs tied to the FPC map create the assembly shown in this track. \ 1.083 Gigabases or 74% of the sequence \ could be tied to the FPC map. The finished clone sequence was then \ analyzed via a pipeline that included repeatmasking, ab initio gene \ prediction and blast searches against all protein, EST and cDNA sequences that \ were available. Results from this analysis were used to manually annotate clones \ with gene structures, descriptions and poly-A features. At this point, the clone \ was submitted to EMBL/GenBank again and can be browsed in \ Vega.\

\

Credits

\

\ The Zv3 Zebrafish assembly was produced by The Wellcome Trust Sanger Institute,\ in collaboration with the Max Planck Institute for Developmental Biology, \ the Netherlands Institute for Developmental Biology (Hubrecht Laboratory),\ and Yi Zhou, Anthony DiBiase and Leonard Zon from the Boston Children's \ Hospital. \ map 1 ctgPos2 Contigs Assembly Contigs 0 10 0 0 0 127 127 127 0 0 0

Description

\

\ This track shows the whole genome shotgun (WGS) contigs of the \ November 2003 $organism Zv3 assembly from The Wellcome Trust Sanger Institute.\

\ Following the assembly of sequence to WGS contigs, contigs were mapped to \ scaffolds (supercontigs). A fingerprinted contigs (FPC) map was produced by \ restriction enzyme digestion with HindIII. Clones were selected from this for \ high quality sequencing. To create the Zv3 assembly, the supercontigs were tied \ to this FPC map. \

\

\ The Zv3 assembly is composed of 62,895 contigs, of which 3,326 are stitched \ contigs. The total stitched contig length is 1.083 gigabases. The assembly \ contains 58,339 supercontigs (scaffolds) having an N50 length of 433.7kb. Of \ these, a total of 56,849 supercontigs are in the unordered chromosomes:\ 54,798 in chrFinished, 1842 in chrUn, and 209 in chrNA.\

\

Credits

\ The Zv3 Zebrafish assembly was produced by The Wellcome Trust Sanger Institute,\ in collaboration with the Max Planck Institute for Developmental Biology, \ the Netherlands Institute for Developmental Biology (Hubrecht Laboratory),\ and Yi Zhou, Anthony DiBiase and Leonard Zon from the Boston Children's \ Hospital. \ \ \ map 0 gap Gap bed 3 + Gap Locations 1 11 0 0 0 127 127 127 0 0 0

Description

\ This track depicts gaps in the $organism assembly. These gaps - with the\ exception of intractable heterochromatic gaps - will be closed during the\ finishing process.\

\ Gaps are represented as black boxes in this track.\ If the relative order and orientation of the contigs on either side\ of the gap is known, it is a bridged gap and a white line is drawn \ through the black box representing the gap. \

\

This assembly contains only one type of gap: contig gaps. These are gaps of \ 5000 bp between the scaffolds ("super-contigs") constructed from the \ whole genome shotgun (WGS) contigs. \ On the virtual chromosomes containing scaffolds that could not be \ confidently placed on a chromosome (e.g. chrUn, chrFinished and chrNA), 1000 bp gaps have been inserted between scaffolds.\

\ map 1 rhMap Radiation Hybrid Map psl . Alignments of Sequences for Radiaton Hybrid Map (RH map) 0 14 0 0 0 127 127 127 0 0 0

Description

\

\ This track shows alignments between $organism radiation hybrid (RH) map \ sequences (consisting of ESTs and other genetic markers) and the genome. Many \ of these markers are not genetically polymorphic and therefore cannot be \ placed on genetic maps. The markers with RH map positions are useful for \ locating genes lying in genomic regions with mutations and in the \ positional cloning of zebrafish genetic mutants that exhibit interesting \ phenotypes.

\

\ 8707 RH map sequences were aligned, consisting of:\

\

\ The majority of these sequences are single-read sequences \ in the range of about 40 - 1300 bp. The ESTs usually\ represent fragments of transcribed genes. \ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ PSL alignment tracks. \ The strand information (+/-) indicates the\ direction of the match between the EST and the matching\ genomic sequence. It bears no relationship to the direction\ of transcription of the RNA with which it might be associated.

\ \

Methods

\

\ To generate this track, $organism RH map sequences were aligned against the \ genome using blat. Only alignments with a base identity level within 0.1% of \ the best and at least 96% base identity with the genomic sequence were kept. \ The following pslReps parameters were used: -nearTop=0.0001 -minAli=0.96\ -minCover=0.40.

\ \

Credits

\

\ The $Organism RH Map track was produced by UCSC in collaboration with the \ Zebrafish Genome Initiative at Childrens Hospital, Boston, from sequence data\ obtained from the individuals and institutions mentioned in the Description \ section.

\ map 1 bacEndPairs BAC End Pairs bed 6 + BAC End Pairs 0 15 0 0 0 127 127 127 0 0 0

Description

\

\ Bacterial artificial chromosomes (BACs) are a key part of many large-scale \ sequencing projects. A BAC typically consists of 50 - 300 kb of DNA. \ During the early phase of a sequencing project, it is common to sequence a \ single read (approximately 500 bases) off each end of a large number of BACs. \ Later in the project, these BAC end reads are mapped to the genome sequence. \ A valid pair of BAC end sequences must be at least 2 kb but no more than \ 800 kb away from each other. The orientation of the first BAC end sequence \ must be "+"; that of the second BAC end sequence must be \ "-". \

\ These BAC end pairs can be useful for validating the assembly over relatively \ long ranges. In some cases, the BACs are useful biological reagents. This \ annotation can also be used to determine which BAC contains a given gene, \ useful information for certain wet lab experiments.

\

\ This track shows mappings in cases where both ends could be mapped. \ For the $organism assembly, the BACs are approximately 150-200 kb in size and \ are used for both the fingerprint contig (FPC) and radiation hybrid (RH) maps.\ After alignment of $organism BAC end sequences to this assembly, the distances\ between pairs of BAC ends were in the range of 2 - 650 kb with an average \ distance of approximately 170 kb. Some BAC end sequences have replicate \ reads; in these cases, the read names are similar and the BAC clone name is \ the same. If reads for the same BAC clone have identical alignments, they \ are shown on the same description page.

\

\ The scoring scheme used for this annotation assigns 1000 to an alignment \ when the BAC end pair aligns to only one location in the genome (after \ filtering). When a BAC end pair or clone aligns to multiple locations, the \ score is calculated as 1500/(number of alignments).

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ PSL alignment tracks. On the track description page, the \ display may be configured to show only those items with an unnormalized score\ that equals or exceeds a user-specified minimum.

\

\ To view the registry entry for a specific clone, open the details page for the\ clone and click on its name at the top of the page. Not all $organism BAC \ clones have been submitted to NCBI Clone Registry as of July 2004; therefore, \ some of the clone links to NCBI may not yet be active. \ Information about the libraries may be found on the Sanger Institute \ Zebrafish Library Details page. Additional information about\ some of the clones, including how they can be obtained, may be found at the\ NCBI Clone Registry.\

\ Information about STS markers associated with a BAC clone is displayed when\ available. Aliases include those for the BAC clone and those for associated\ STS markers. The BAC ends tracks may be searched using the clone name, the \ Sanger internal BAC clone name, the Sanger STS name or any of these aliases. \ The UniSTS ID(s) shown are those associated with the STS aliases for each \ Sanger STS name.

\

\ NOTE: The primer sequences shown may differ from those associated with the\ uniSTS IDs in UniSTS. In cases where more than one UniSTS ID exists, primer \ sequences may be the same as those in UniSTS for one of the UniSTS IDs.\ The value in the relationship field indicates the method used to find the \ STS markers associated with a particular BAC clone:\

\ \

Methods

\

BAC end sequences were placed on the assembled sequence using blat, \ followed by pslReps using the parameters -nearTop=0.02 -minCover=0.40 \ -minAli=0.85 -noIntrons. This ensured that only alignments with \ least 85% identity, a minimum sequence coverage of 40% and a base identity \ level within 2% of the best were kept. No penalty was imposed for sequences \ that lacked introns. Furthermore, a base identity of at least 91% was required \ of at least one BAC end of the pair.

\ \

Credits

\

\ The $Organism BAC End Pairs track was produced at UCSC from \ data obtained from the following sources:\

\

\ This track was produced in collaboration with the \ Zebrafish Genome Initiative at Childrens Hospital, Boston.\

\ \

References

\

\ Kent, W.J.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 12(4), 656-664 (2002).

\ map 1 exonArrows off\ bacEndSingles BAC End Singles bed 6 + BAC End Singles 0 15 0 0 0 127 127 127 0 0 0

Description

\

\ Bacterial artificial chromosomes (BACs) are a key part of many large-scale \ sequencing projects. A BAC typically consists of 50 - 300 kb of DNA. \ During the early phase of a sequencing project, it is common to sequence a \ single read (approximately 500 bases) off each end of a large number of BACs. \ Later in the project, these BAC end reads are mapped to the genome sequence. \ A valid pair of BAC end sequences must be at least 2 kb but no more than \ 800 kb away from each other. The orientation of the first BAC end sequence \ must be "+"; that of the second BAC end sequence must be \ "-". \

\ These BAC end pairs can be useful for validating the assembly over relatively \ long ranges. In some cases, the BACs are useful biological reagents. This \ annotation can also be used to determine which BAC contains a given gene, \ useful information for certain wet lab experiments.

\

\ This track shows mappings in cases where only one BAC end was mapped, either\ because only one end sequence exists or only one end sequence\ could be mapped to the genome with an alignment that met the criteria \ described in the Methods section.\ For the $organism assembly, the BACs are approximately 150-200 kb in size and \ are used for both the fingerprint contig (FPC) and radiation hybrid (RH) maps.\ Some BAC end sequences have replicate reads; in these cases, the read names are similar and the BAC clone name is the same. If reads for the same BAC clone \ have identical alignments, they are shown on the same description page.

\

\ The scoring scheme used for this annotation assigns 1000 to an alignment \ when a single BAC end aligns to only one location in the genome (after \ filtering). When a single BAC end or clone aligns to multiple locations, the \ score is calculated as 1500/(number of alignments).

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ PSL alignment tracks. On the track description page, the \ display may be configured to show only those items with an unnormalized score\ that equals or exceeds a user-specified minimum.

\

\ To view the registry entry for a specific clone, open the details page for the\ clone and click on its name at the top of the page. Not all $organism BAC \ clones have been submitted to NCBI Clone Registry as of July 2004; therefore, \ some of the clone links to NCBI may not yet be active. \ Information about the libraries may be found on the Sanger Institute \ Zebrafish Library Details page. Additional information about\ some of the clones, including how they can be obtained, may be found at the\ NCBI Clone Registry.\

\ Information about STS markers associated with a BAC clone is displayed when\ available. Aliases include those for the BAC clone and those for associated\ STS markers. The BAC ends tracks may be searched using the clone name, the \ Sanger internal BAC clone name, the Sanger STS name or any of these aliases. \ The UniSTS ID(s) shown are those associated with the STS aliases for each \ Sanger STS name.

\

\ NOTE: The primer sequences shown may differ from those associated with the\ uniSTS IDs in UniSTS. In cases where more than one UniSTS ID exists, primer \ sequences may be the same as those in UniSTS for one of the UniSTS IDs.\ The value in the relationship field indicates the method used to find the \ STS markers associated with a particular BAC clone:\

\ \

Methods

\

BAC end sequences were placed on the assembled sequence using blat, \ followed by pslReps using the parameters -nearTop=0.02 -minCover=0.40 \ -minAli=0.85 -noIntrons. This ensured that only alignments with \ least 85% identity, a minimum sequence coverage of 40% and a base identity \ level within 2% of the best were kept. No penalty was imposed for sequences \ that lacked introns. Furthermore, a base identity of at least 91% was required \ of at least one BAC end of the pair.

\ \

Credits

\

\ The $Organism BAC End Pairs track was produced at UCSC from \ data obtained from the following sources:\

\

\ This track was produced in collaboration with the \ Zebrafish Genome Initiative at Childrens Hospital, Boston.\

\ \

References

\

\ Kent, W.J.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 12(4), 656-664 (2002).

\ map 1 exonArrows off\ gc5Base GC Percent wig 0 100 GC Percent in 5-Base Windows 0 23.5 0 0 0 128 128 128 0 0 0

Description

\ The GC percent track shows the percentage of G (guanine) and C (cytosine) bases\ in 5-base windows. High GC content is typically associated with\ gene-rich areas.\

\

\ This track may be configured in a variety of ways to highlight different aspects \ of the displayed information. Click the "Graph configuration help" link\ for an explanation of the configuration options.\ \

Credits

\

The data and presentation of this graph were prepared by\ Hiram Clawson.\ \ map 0 autoScaleDefault Off\ defaultViewLimits 30:70\ graphTypeDefault Bar\ gridDefault OFF\ maxHeightPixels 128:36:16\ spanList 5\ windowingFunction Mean\ refGene RefSeq Genes genePred refPep refMrna RefSeq Genes 1 35 12 12 120 133 133 187 0 0 0

Description

\

\ The RefSeq Genes track shows known protein-coding genes taken from \ the NCBI mRNA reference sequences collection (RefSeq). On assemblies in \ which incremental GenBank downloads are supported, the data underlying this \ track are updated nightly.

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ gene prediction \ tracks.\ The color shading indicates the level of review the RefSeq record has \ undergone: predicted (light), provisional (medium), reviewed (dark). \ In some assemblies, non-coding RNA genes are shown in a separate track.

\

\ The item labels and display colors of features within this track can be\ configured through the controls at the top of the track description page. \ This page is accessed via the small button to the left of the track's \ graphical display or through the link on the track's control menu. \

\ \

Methods

\

\ RefSeq mRNAs were aligned against the $organism genome using blat; those\ with an alignment of less than 15% were discarded. When a single mRNA \ aligned in multiple places, the alignment having the highest base identity \ was identified. Only alignments having a base identity level within 0.1% of \ the best and at least 96% base identity with the genomic sequence were kept.\

\ \ \

Credits

\

\ This track was produced at UCSC from mRNA sequence data\ generated by scientists worldwide and curated by the \ NCBI RefSeq project.

\ \

References

\

\ Kent, W.J.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 12(4), 656-664 (2002).

\ \

Pruitt K.D., Tatusova, T., Maglott D.R. \ NCBI Reference Sequence (RefSeq): a curated non-redundant \ sequence database of genomes, transcripts and proteins Nucleic Acids \ Res. 33(1), D501-D504 (2005).\

\ genes 1 baseColorUseCds given\ mgcGenes ZGC Genes genePred Zebrafish Gene Collection Full ORF mRNAs 3 36 34 139 34 144 197 144 0 0 0

Description

\

\ This track shows alignments of $organism mRNAs from the\ Zebrafish Gene Collection \ (ZGC) having full-length open reading frames (ORFs) to the genome. This is a \ subproject of the Mammalian Gene Collection (MGC) project.

\ \

Display Conventions and Configuration

\

\ The track follows the display conventions for \ gene prediction \ tracks.

\

\ An optional codon coloring feature is available for quick\ validation and comparison of gene predictions.\ To display codon colors, select the genomic codons option from the\ Color track by codons pull-down menu. Click \ here for more \ information about this feature.

\ \

Methods

\

\ GenBank $organism ZGC mRNAs identified as having full-length ORFs \ were aligned against the genome using blat. When a single mRNA \ aligned in multiple places, the alignment having the highest base identity was\ found. Only alignments having a base identity level within 1% of\ the best and at least 95% base identity with the genomic sequence \ were kept.

\ \

Credits

\

\ The $organism ZGC full-length mRNA track was produced at UCSC from \ mRNA sequence data submitted to \ GenBank by the \ Zebrafish Gene Collection \ project.

\ \

References

\

\ Mammalian Gene Collection project references.

\

\ Kent, W.J.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 12(4), 656-664 (2002).

\ genes 1 ensGene Ensembl Genes genePred ensPep Ensembl Gene Predictions 3 40 150 0 0 202 127 127 0 0 0 http://www.ensembl.org/perl/transview?transcript=$$

Description

\

\ These gene predictions were generated by Ensembl.

\ \

Methods

\

\ For a description of the methods used in Ensembl gene prediction, refer to \ Hubbard, T. et al. (2002) in the References section below.

\ \

Credits

\

\ Thanks to Ensembl for providing this annotation.

\ \

References

\

\ Hubbard, T. et al.. \ The Ensembl genome database project.\ Nucleic Acids Research 30(1), 38-41 (2002).

\ \ genes 1 mrna $Organism mRNAs psl . $Organism mRNAs from GenBank 3 54 0 0 0 127 127 127 1 0 0

Description

\

\ The mRNA track shows alignments between $organism mRNAs\ in GenBank and the genome.

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ PSL alignment tracks. In dense display mode, the items that\ are more darkly shaded indicate matches of better quality.

\

\ The description page for this track has a filter that can be used to change \ the display mode, alter the color, and include/exclude a subset of items \ within the track. This may be helpful when many items are shown in the track \ display, especially when only some are relevant to the current task.

\

\ To use the filter:\

    \
  1. Type a term in one or more of the text boxes to filter the mRNA \ display. For example, to apply the filter to all mRNAs expressed in a specific\ organ, type the name of the organ in the tissue box. To view the list of \ valid terms for each text box, consult the table in the Table Browser that \ corresponds to the factor on which you wish to filter. For example, the \ "tissue" table contains all the types of tissues that can be \ entered into the tissue text box. Wildcards may also be used in the\ filter.\
  2. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only mRNAs that match all filter \ criteria will be highlighted. If "or" is selected, mRNAs that \ match any one of the filter criteria will be highlighted.\
  3. Choose the color or display characteristic that should be used to \ highlight or include/exclude the filtered items. If "exclude" is \ chosen, the browser will not display mRNAs that match the filter criteria. \ If "include" is selected, the browser will display only those \ mRNAs that match the filter criteria.\

\

\ This track may also be configured to display codon coloring, a feature that\ allows the user to quickly validate and compare mRNA. For more \ information about this option, click \ here.\

\ \

Methods

\

\ GenBank $organism mRNAs were aligned against the genome using the \ blat program. When a single mRNA aligned in multiple places, \ the alignment having the highest base identity was found. \ Only alignments having a base identity level within 0.5% of\ the best and at least 96% base identity with the genomic sequence were kept.\

\ \

Credits

\

\ The mRNA track was produced at UCSC from mRNA sequence data\ submitted to the international public sequence databases by \ scientists worldwide.

\ \

References

\

\ Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., and \ Wheeler, D.L. \ GenBank: update. Nucleic Acids Res. 32,\ D23-6 (2004).

\

\ Kent, W.J.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 12(4), 656-664 (2002).

\ \ rna 1 baseColorUseCds genbank\ baseColorUseSequence genbank\ cdsDrawOptions enabled\ showDiffBasesAllScales .\ intronEst Spliced ESTs psl est $Organism ESTs That Have Been Spliced 1 56 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows alignments between $organism expressed sequence tags \ (ESTs) in GenBank and the genome that show signs of splicing when\ aligned against the genome. ESTs are single-read sequences, typically about \ 500 bases in length, that usually represent fragments of transcribed genes.\

\

\ To be considered spliced, an EST must show \ evidence of at least one canonical intron, i.e. the genomic \ sequence between EST alignment blocks must be at least 32 bases in \ length and have GT/AG ends. By requiring splicing, the level \ of contamination in the EST databases is drastically reduced\ at the expense of eliminating many genuine 3' ESTs.\ For a display of all ESTs (including unspliced), see the \ $organism EST track.

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ PSL alignment tracks. In dense display mode, the items that\ are more darkly shaded indicate matches of better quality.

\

\ The strand information (+/-) indicates the\ direction of the match between the EST and the matching\ genomic sequence. It bears no relationship to the direction\ of transcription of the RNA with which it might be associated.

\

\ The description page for this track has a filter that can be used to change \ the display mode, alter the color, and include/exclude a subset of items \ within the track. This may be helpful when many items are shown in the track \ display, especially when only some are relevant to the current task.

\

\ To use the filter:\

    \
  1. Type a term in one or more of the text boxes to filter the EST\ display. For example, to apply the filter to all ESTs expressed in a specific\ organ, type the name of the organ in the tissue box. To view the list of \ valid terms for each text box, consult the table in the Table Browser that \ corresponds to the factor on which you wish to filter. For example, the \ "tissue" table contains all the types of tissues that can be \ entered into the tissue text box. Wildcards may also be used in the\ filter.\
  2. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only ESTs that match all filter \ criteria will be highlighted. If "or" is selected, ESTs that \ match any one of the filter criteria will be highlighted.\
  3. Choose the color or display characteristic that should be used to \ highlight or include/exclude the filtered items. If "exclude" is \ chosen, the browser will not display ESTs that match the filter criteria. \ If "include" is selected, the browser will display only those \ ESTs that match the filter criteria.\

\

\ This track may also be configured to display base labeling, a feature that\ allows the user to display all bases in the aligning sequence or only those \ that differ from the genomic sequence. For more information about this option,\ click \ here.\

\ \

Methods

\

\ To make an EST, RNA is isolated from cells and reverse\ transcribed into cDNA. Typically, the cDNA is cloned\ into a plasmid vector and a read is taken from the 5'\ and/or 3' primer. For most — but not all — ESTs, the\ reverse transcription is primed by an oligo-dT, which\ hybridizes with the poly-A tail of mature mRNA. The\ reverse transcriptase may or may not make it to the 5'\ end of the mRNA, which may or may not be degraded.

\

\ In general, the 3' ESTs mark the end of transcription\ reasonably well, but the 5' ESTs may end at any point\ within the transcript. Some of the newer cap-selected\ libraries cover transcription start reasonably well. Before the \ cap-selection techniques\ emerged, some projects used random rather than poly-A\ priming in an attempt to retrieve sequence distant from the\ 3' end. These projects were successful at this, but as\ a side effect also deposited sequences from unprocessed\ mRNA and perhaps even genomic sequences into the EST databases.\ Even outside of the random-primed projects, there is a\ degree of non-mRNA contamination. Because of this, a\ single unspliced EST should be viewed with considerable\ skepticism.

\

\ To generate this track, $organism ESTs from GenBank were aligned \ against the genome using blat. Note that the maximum intron length\ allowed by blat is 750,000 bases, which may eliminate some ESTs with very \ long introns that might otherwise align. When a single \ EST aligned in multiple places, the alignment having the \ highest base identity was identified. Only alignments having\ a base identity level within 0.5% of the best and at least 96% base identity \ with the genomic sequence are displayed in this track.

\ \

Credits

\

\ This track was produced at UCSC from EST sequence data\ submitted to the international public sequence databases by \ scientists worldwide.

\ \

References

\

\ Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., and \ Wheeler, D.L. \ GenBank: update. Nucleic Acids Res. 32,\ D23-6 (2004).

\

\ Kent, W.J.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 12(4), 656-664 (2002).

\ \ rna 1 baseColorUseSequence genbank\ intronGap 30\ maxItems 300\ pslSequenceBases no\ showDiffBasesAllScales .\ est $Organism ESTs psl est $Organism ESTs Including Unspliced 0 57 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows alignments between $organism expressed sequence tags \ (ESTs) in GenBank and the genome. ESTs are single-read sequences, \ typically about 500 bases in length, that usually represent fragments of \ transcribed genes.

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ PSL alignment tracks. In dense display mode, the items that\ are more darkly shaded indicate matches of better quality.

\

\ The strand information (+/-) indicates the\ direction of the match between the EST and the matching\ genomic sequence. It bears no relationship to the direction\ of transcription of the RNA with which it might be associated.

\

\ The description page for this track has a filter that can be used to change \ the display mode, alter the color, and include/exclude a subset of items \ within the track. This may be helpful when many items are shown in the track \ display, especially when only some are relevant to the current task.

\

\ To use the filter:\

    \
  1. Type a term in one or more of the text boxes to filter the EST\ display. For example, to apply the filter to all ESTs expressed in a specific\ organ, type the name of the organ in the tissue box. To view the list of \ valid terms for each text box, consult the table in the Table Browser that \ corresponds to the factor on which you wish to filter. For example, the \ "tissue" table contains all the types of tissues that can be \ entered into the tissue text box. Wildcards may also be used in the\ filter.\
  2. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only ESTs that match all filter \ criteria will be highlighted. If "or" is selected, ESTs that \ match any one of the filter criteria will be highlighted.\
  3. Choose the color or display characteristic that should be used to \ highlight or include/exclude the filtered items. If "exclude" is \ chosen, the browser will not display ESTs that match the filter criteria. \ If "include" is selected, the browser will display only those \ ESTs that match the filter criteria.\

\

\ This track may also be configured to display base labeling, a feature that\ allows the user to display all bases in the aligning sequence or only those \ that differ from the genomic sequence. For more information about this option,\ click \ here.\

\ \

Methods

\

\ To make an EST, RNA is isolated from cells and reverse\ transcribed into cDNA. Typically, the cDNA is cloned\ into a plasmid vector and a read is taken from the 5'\ and/or 3' primer. For most — but not all — ESTs, the\ reverse transcription is primed by an oligo-dT, which\ hybridizes with the poly-A tail of mature mRNA. The\ reverse transcriptase may or may not make it to the 5'\ end of the mRNA, which may or may not be degraded.

\

\ In general, the 3' ESTs mark the end of transcription\ reasonably well, but the 5' ESTs may end at any point\ within the transcript. Some of the newer cap-selected\ libraries cover transcription start reasonably well. Before the \ cap-selection techniques\ emerged, some projects used random rather than poly-A\ priming in an attempt to retrieve sequence distant from the\ 3' end. These projects were successful at this, but as\ a side effect also deposited sequences from unprocessed\ mRNA and perhaps even genomic sequences into the EST databases.\ Even outside of the random-primed projects, there is a\ degree of non-mRNA contamination. Because of this, a\ single unspliced EST should be viewed with considerable\ skepticism.

\

\ To generate this track, $organism ESTs from GenBank were aligned \ against the genome using blat. Note that the maximum intron length\ allowed by blat is 750,000 bases, which may eliminate some ESTs with very \ long introns that might otherwise align. When a single \ EST aligned in multiple places, the alignment having the \ highest base identity was identified. Only alignments having\ a base identity level within 0.5% of the best and at least 96% base identity \ with the genomic sequence were kept.

\ \

Credits

\

\ This track was produced at UCSC from EST sequence data\ submitted to the international public sequence databases by \ scientists worldwide.

\ \

References

\

\ Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., and \ Wheeler, D.L. \ GenBank: update. Nucleic Acids Res. 32,\ D23-6 (2004).

\

\ Kent, W.J.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 12(4), 656-664 (2002).

\ \ rna 1 baseColorUseSequence genbank\ intronGap 30\ maxItems 300\ pslSequenceBases no\ wz_ests Zfish WZ EST Clusters psl est $Organism WZ EST Clusters from Washington University 0 58 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows alignments between $organism WZ expressed\ sequence tags (ESTs) from\ WashU-Zebrafish Genome Resources and the genome.\ ESTs are single-read sequences, typically about 500 bases in length, that \ usually represent fragments of transcribed genes. These WZ ESTs are \ compiled to produce longer reads by clustering together ESTs that originate \ from the same transcript.

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ PSL alignment tracks. In dense display mode, the items that\ are more darkly shaded indicate matches of better quality.

\

\ The strand information (+/-) indicates the\ direction of the match between the EST and the matching\ genomic sequence. It bears no relationship to the direction\ of transcription of the RNA with which it might be associated.

\

\ The description page for this track has a filter that can be used to change \ the display mode, alter the color, and include/exclude a subset of items \ within the track. This may be helpful when many items are shown in the track \ display, especially when only some are relevant to the current task.

\

\ To use the filter:\

    \
  1. Type a term in one or more of the text boxes to filter the EST\ display. For example, to apply the filter to all ESTs expressed in a specific\ organ, type the name of the organ in the tissue box. To view the list of \ valid terms for each text box, consult the table in the Table Browser that \ corresponds to the factor on which you wish to filter. For example, the \ "tissue" table contains all the types of tissues that can be \ entered into the tissue text box. Wildcards may also be used in the\ filter.\
  2. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only ESTs that match all filter \ criteria will be highlighted. If "or" is selected, ESTs that \ match any one of the filter criteria will be highlighted.\
  3. Choose the color or display characteristic that should be used to \ highlight or include/exclude the filtered items. If "exclude" is \ chosen, the browser will not display ESTs that match the filter criteria. \ If "include" is selected, the browser will display only those \ ESTs that match the filter criteria.\

\

\ This track may also be configured to display base labeling, a feature that\ allows the user to display all bases in the aligning sequence or only those \ that differ from the genomic sequence. For more information about this option,\ click \ here.\

\ \

Methods

\

\ To make an EST, RNA is isolated from cells and reverse\ transcribed into cDNA. Typically, the cDNA is cloned\ into a plasmid vector and a read is taken from the 5'\ and/or 3' primer. For most — but not all — ESTs, the\ reverse transcription is primed by an oligo-dT, which\ hybridizes with the poly-A tail of mature mRNA. The\ reverse transcriptase may or may not make it to the 5'\ end of the mRNA, which may or may not be degraded.

\

\ In general, the 3' ESTs mark the end of transcription\ reasonably well, but the 5' ESTs may end at any point\ within the transcript. Some of the newer cap-selected\ libraries cover transcription start reasonably well. Before the \ cap-selection techniques\ emerged, some projects used random rather than poly-A\ priming in an attempt to retrieve sequence distant from the\ 3' end. These projects were successful at this, but as\ a side effect also deposited sequences from unprocessed\ mRNA and perhaps even genomic sequences into the EST databases.\ Even outside of the random-primed projects, there is a\ degree of non-mRNA contamination. Because of this, a\ single unspliced EST should be viewed with considerable\ skepticism.

\

\ To generate this track, $organism ESTs from Washington University were aligned\ against the genome using blat. Note that the maximum intron length\ allowed by blat is 750,000 bases, which may eliminate some ESTs with very \ long introns that might otherwise align. When a single \ EST aligned in multiple places, the alignment having the \ highest base identity was identified. Only alignments having\ a base identity level within 0.5% of the best and at least 96% base identity \ with the genomic sequence were kept.

\ \

Credits

\

\ This track was produced at UCSC from WZ EST sequence data\ from Washington University in collaboration with the \ Zebrafish Genome \ Initiative at Childrens Hospital, Boston, MA.\ \

References

\

\

\ Kent, W.J.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 12(4), 656-664 (2002).

\ rna 1 xenoMrna Other mRNAs psl xeno Non-$Organism mRNAs from GenBank 0 63 0 0 0 127 127 127 1 0 0

Description

\

\ This track displays translated blat alignments of vertebrate and\ invertebrate mRNA in \ GenBank from organisms other than $organism.\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ PSL alignment tracks. In dense display mode, the items that\ are more darkly shaded indicate matches of better quality.

\

\ The strand information (+/-) for this track is in two parts. The\ first + indicates the orientation of the query sequence whose\ translated protein produced the match (here always 5' to 3', hence +).\ The second + or - indicates the orientation of the matching \ translated genomic sequence. Because the two orientations of a DNA \ sequence give different predicted protein sequences, there are four \ combinations. ++ is not the same as --, nor is +- the same as -+.

\

\ The description page for this track has a filter that can be used to change \ the display mode, alter the color, and include/exclude a subset of items \ within the track. This may be helpful when many items are shown in the track \ display, especially when only some are relevant to the current task.

\

\ To use the filter:\

    \
  1. Type a term in one or more of the text boxes to filter the mRNA \ display. For example, to apply the filter to all mRNAs expressed in a specific\ organ, type the name of the organ in the tissue box. To view the list of \ valid terms for each text box, consult the table in the Table Browser that \ corresponds to the factor on which you wish to filter. For example, the \ "tissue" table contains all the types of tissues that can be \ entered into the tissue text box. Wildcards may also be used in the\ filter.\
  2. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only mRNAs that match all filter \ criteria will be highlighted. If "or" is selected, mRNAs that \ match any one of the filter criteria will be highlighted.\
  3. Choose the color or display characteristic that should be used to \ highlight or include/exclude the filtered items. If "exclude" is \ chosen, the browser will not display mRNAs that match the filter criteria. \ If "include" is selected, the browser will display only those \ mRNAs that match the filter criteria.\

\

\ This track may also be configured to display codon coloring, a feature that\ allows the user to quickly validate and compare mRNAs. For more \ information about this option, click \ here.\

\ \

Methods

\

\ The mRNAs were aligned against the $organism genome using translated blat. \ When a single mRNA aligned in multiple places, the alignment having the \ highest base identity was found. Only those alignments having a base \ identity level within 1% of the best and at least 25% base identity with the \ genomic sequence were kept.

\ \

Credits

\

\ The mRNA track was produced at UCSC from mRNA sequence data\ submitted to the international public sequence databases by \ scientists worldwide.

\ \

References

\

\ Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., and \ Wheeler, D.L. \ GenBank: update. Nucleic Acids Res. 32,\ D23-6 (2004).

\

\ Kent, W.J.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 12(4), 656-664 (2002).

\ rna 1 baseColorUseCds genbank\ baseColorUseSequence genbank\ cdsDrawOptions enabled\ showDiffBasesAllScales .\ uniGene_dr UniGene psl . $Organism UniGene Alignments 0 70 0 0 0 127 127 127 0 0 0 http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=unigene&term=$$[ClusterID]

Description

\

\ This track shows alignments of $organism \ UniGene\ sequences to the $organism genome. Sequences are from UniGene Build #78.

\ \

Methods

\

\ The $organism genome was first masked with RepeatMasker and Tandem Repeats\ Finder. The alignments were then made using blat with a minimum sequence \ identity of 95%. Only alignments that met the following criteria are shown \ in this track: a minimum of 20% coverage, at least 96.5% alignment ratio, \ and scores within 0.2% of best-in-genome.

\ rna 1 affyZebrafish Affy Zebrafish psl . Alignments of Affymetrix Consensus Sequences from Zebrafish Chip 0 89 0 0 0 127 127 127 0 0 0

Description

\

\ This track shows the location of the consensus sequences used for the \ selection of probes on the Affymetrix Zebrafish genome chip.

\ \

Methods

\

\ Consensus sequences were downloaded from the\ Affymetrix Product Support web page.\ The sequences were mapped to the genome with blat followed by pslReps using \ the parameters -minAli=0.95 and -nearTop=0.005.

\ \

Credits

\

\ Thanks to Affymetrix for\ the data underlying this track.

\ regulation 1 tblastnHg16KGPep Human tBLASTn psl protein $o_Organism ($o_date/$o_db) KnownGenePep tBLASTn 0 121 0 0 0 127 127 127 0 0 0

Description

\

\ This track contains tBLASTn alignments of the peptides\ from the predicted and known genes identified in the hg16 Known Genes track as \ of 21 July 2004.\ \

Methods

\ The predicted proteins from the human knownGene track were aligned with \ the human genome using the tblastn. The proteins were obtained by \ merging annotation data from the kgXref table with the sequence and name found in \ the knownGenePep table. This resulted in a set of 40,115 protein sequences. \ The tblastn results were filtered using a threshold E-value of 1e-100 and, for \ each known gene peptide query, the top 5 hits to the $organism genome were kept.\ \

Credits

\

\ tBLASTn is part of the NCBI Blast tool set. For more information on Blast, see\ Altschul, S.F., Gish, W., Miller, W., Meyers, E.W. & Lipman, D.J. \ Basic local alignment search tool. J. Mol. Biol. 215(3), 403-410 (1990).\

\ The remaining required utilities were written by Jim Kent, Brian Raney, Rachel \ Harte and Heather Trumbower. This track was produced in collaboration with the \ Zebrafish Genome Initiative at Childrens Hospital, Boston.\ compGeno 1 colorChromDefault off\ otherDb hg16\ chainHg17 Human Chain chain hg17 $o_Organism ($o_date/$o_db) Chained Alignments 0 124.2 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single $o_organism chromosome and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks. Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 1 matrix 16 91,-90,-25,-100,-90,100,-100,-25,-25,-100,100,-90,-100,-25,-90,91\ otherDb hg17\ netHg17 Human Net netAlign hg17 chainHg17 $o_Organism ($o_date/$o_db) Alignment Net 1 124.3 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his program \ RepeatMasker.

\

\ The browser display and database storage of the nets were made\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

\ \ \ compGeno 0 otherDb hg17\ chainMm5 Mouse Chain chain mm5 $o_Organism ($o_date/$o_db) Chained Alignments 0 126.2 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single $o_organism chromosome and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks. Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 1 matrix 16 91,-90,-25,-100,-90,100,-100,-25,-25,-100,100,-90,-100,-25,-90,91\ otherDb mm5\ netMm5 Mouse Net netAlign mm5 chainMm5 $o_Organism ($o_date/$o_db) Alignment Net 0 126.3 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his program \ RepeatMasker.

\

\ The browser display and database storage of the nets were made\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 0 otherDb mm5\ chainFr1 Fugu Chain chain fr1 $o_Organism ($o_date/$o_db) Chained Alignments 0 130.2 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_Organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_Organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_Organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_Organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were then added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single $o_Organism chromosome and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks. Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 1 otherDb fr1\ netFr1 Fugu Net netAlign fr1 chainFr1 $o_Organism ($o_date/$o_db) Alignment Net 0 130.3 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_Organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_Organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\ \

References

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 0 otherDb fr1\ rmsk RepeatMasker rmsk Repeating Elements by RepeatMasker 1 149.1 0 0 0 127 127 127 1 0 0

Description

\

\ This track was created by using Arian Smit's RepeatMasker program, which screens DNA sequences \ for interspersed repeats and low complexity DNA sequences. The program\ outputs a detailed annotation of the repeats that are present in the \ query sequence, as well as a modified version of the query sequence \ in which all the annotated repeats have been masked. RepeatMasker uses \ the RepBase library of repeats from the \ Genetic \ Information Research Institute (GIRI). \ RepBase is described in Jurka, J. (2000) in the References section below.

\ \

Display Conventions and Configuration

\

\ In full display mode, this track displays up to nine different classes of repeats:\

\

\ The level of color shading in the graphical display reflects the amount of \ base mismatch, base deletion, and base insertion associated with a repeat \ element. The higher the combined number of these, the lighter the shading.

\ \

Methods

\

\ UCSC has used the most current versions of the RepeatMasker software \ and repeat libraries available to generate these data. Note that these \ versions may be newer than those that are publicly available on the Internet. \

\

\ Data are generated using the RepeatMasker -s flag. Additional flags\ may be used for certain organisms. Repeats are soft-masked. Alignments may \ extend through repeats, but are not permitted to initiate in them. \ See the \ FAQ for \ more information.

\ \

Credits

\

\ Thanks to Arian Smit and GIRI\ for providing the tools and repeat libraries used to generate this track.

\ \

References

\

\ RepBase is described in \ Jurka, J. \ Repbase update: a database and an electronic journal of \ repetitive elements. \ Trends Genet. 16(9), 418-420 (2000).

\

\ For a discussion of repeats in mammalian genomes, see: \

\ Smit, A.F. Interspersed repeats and other mementos of transposable \ elements in mammalian genomes. Curr Opin Genet Dev 9(6),\ 657-63 (1999).

\

\ Smit, A.F. The origin of interspersed repeats in the human genome. \ Curr Opin Genet Dev. 6(6), 743-8 (1996).\

\ varRep 0 simpleRepeat Simple Repeats bed 4 + Simple Tandem Repeats by TRF 0 149.3 0 0 0 127 127 127 0 0 0

Description

\

\ This track displays simple tandem repeats (possibly imperfect) located\ by Tandem Repeats\ Finder (TRF), which is specialized for this purpose. These repeats can\ occur within coding regions of genes and may be quite\ polymorphic. Repeat expansions are sometimes associated with specific\ diseases.

\ \

Methods

\

\ For more information about the TRF program, see Benson (1999).\

\ \

Credits

\

\ TRF was written by \ Gary Benson.

\ \

References

\

\ Benson G. \ Tandem repeats finder: a program to analyze DNA sequences.\ Nucleic Acids Res. 1999 Jan 15;27(2):573-80.

\ varRep 1 blastHg16KG $o_Organism Proteins psl protein $o_Organism ($o_date/$o_db) Proteins 3 202 0 0 0 127 127 127 0 0 0

Description

\

\ This track contains tBLASTn alignments of the peptides from the predicted \ and known genes identified in the hg16 Known Genes track as of 27 May 2004.\

\ \

Methods

\

\ First the predicted proteins from the human Known Genes track were aligned \ with the human genome using the blat program to discover exon boundaries. \ Next the amino acid sequences that make up each exon were aligned with the \ $organism sequence using the tBLASTn program.\ Finally the putative $organism exons were chained together using an organism- \ specific maximum gap size (27,000 bp) but no gap penalty. The single best exon\ chains extending over more than 60% of the query protein were included. All \ exon chains that matched at least 60% of the protein's amino acids were also \ included.

\ \

Credits

\

\ tBLASTn is part of the NCBI Blast tool set. For more information on Blast, see\ Altschul, S.F., Gish, W., Miller, W., Meyers, E.W. & Lipman, D.J. \ Basic local alignment search tool. \ J. Mol. Biol. 215(3), 403-410 (1990).

\

\ Blat was written by Jim Kent. The remaining utilities \ used to produce this track were written by Jim Kent or Brian Raney.

\ genes 1 blastRef hg16.blastKGRef00\ colorChromDefault off\ otherDb hg16\ pred hg16.blastKGPep00\