cartVersion cartVersion cartVersion cartVersion 0 0 0 0 0 0 0 0 0 0 0 cartVersion cartVersion cartVersion 0 cartVersion 0 pubsBlatPsl Indiv. Seq. Matches psl Individual Sequence Matches of One Selected Article from Sequences Track 0 1 0 115 70 127 185 162 0 0 0 pub 1 color 0,115,70\ configurable off\ configureByPopup off\ longLabel Individual Sequence Matches of One Selected Article from Sequences Track\ parent pubs off\ priority 1\ shortLabel Indiv. Seq. Matches\ track pubsBlatPsl\ type psl\ visibility hide\ pubsBlat Sequences bed 12 + Sequences in Articles: PubmedCentral and Elsevier 1 2 0 0 0 127 127 127 0 0 0 pub 1 configurable off\ configureByPopup off\ longLabel Sequences in Articles: PubmedCentral and Elsevier\ parent pubs on\ priority 2\ shortLabel Sequences\ track pubsBlat\ type bed 12 +\ visibility dense\ refGene RefSeq Genes genePred refPep refMrna RefSeq Genes 0 35 12 12 120 133 133 187 0 0 0

Description

\

\ The RefSeq Genes track shows known C. intestinalis protein-coding and \ non-protein-coding genes taken from the NCBI RNA reference sequences \ collection (RefSeq). The data underlying this track are updated weekly.

\ \

\ Please visit the Feedback for Gene and Reference Sequences (RefSeq) page to\ make suggestions, submit additions and corrections, or ask for help concerning\ RefSeq records.\

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ gene prediction \ tracks.\ The color shading indicates the level of review the RefSeq record has \ undergone: predicted (light), provisional (medium), reviewed (dark).

\

\ The item labels and display colors of features within this track can be\ configured through the controls at the top of the track description page. \ This page is accessed via the small button to the left of the track's \ graphical display or through the link on the track's control menu. \

\ \

Methods

\

\ RefSeq RNAs were aligned against the C. intestinalis genome using blat; \ those with an alignment of less than 15% were discarded. When a single RNA \ aligned in multiple places, the alignment having the highest base identity \ was identified. Only alignments having a base identity level within 0.1% of \ the best and at least 96% base identity with the genomic sequence were kept.\

\ \ \

Credits

\

\ This track was produced at UCSC from RNA sequence data\ generated by scientists worldwide and curated by the \ NCBI RefSeq project.

\ \

References

\

\ Kent WJ.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.

\ \

Pruitt KD, Tatusova T, Maglott DR. \ NCBI Reference Sequence (RefSeq): a curated non-redundant \ sequence database of genomes, transcripts and proteins. Nucleic Acids \ Res. 2005 Jan 1;33(Database issue):D501-4.\

\ genes 1 baseColorUseCds given\ color 12,12,120\ group genes\ idXref hgFixed.refLink mrnaAcc name\ longLabel RefSeq Genes\ priority 35\ shortLabel RefSeq Genes\ track refGene\ type genePred refPep refMrna\ visibility hide\ intronEst Spliced ESTs psl est C. intestinalis ESTs That Have Been Spliced 1 56 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows alignments between C. intestinalis expressed sequence tags\ (ESTs) in GenBank and the genome that show signs of splicing when\ aligned against the genome. ESTs are single-read sequences, typically about \ 500 bases in length, that usually represent fragments of transcribed genes.\

\

\ To be considered spliced, an EST must show \ evidence of at least one canonical intron, i.e. one that is at least\ 32 bases in length and has GT/AG ends. By requiring splicing, the level \ of contamination in the EST databases is drastically reduced\ at the expense of eliminating many genuine 3' ESTs.\ For a display of all ESTs (including unspliced), see the \ C. intestinalis EST track.

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ PSL alignment tracks. In dense display mode, darker shading\ indicates a larger number of aligned ESTs.

\

\ The strand information (+/-) indicates the\ direction of the match between the EST and the matching\ genomic sequence. It bears no relationship to the direction\ of transcription of the RNA with which it might be associated.

\

\ The description page for this track has a filter that can be used to change \ the display mode, alter the color, and include/exclude a subset of items \ within the track. This may be helpful when many items are shown in the track \ display, especially when only some are relevant to the current task.

\

\ To use the filter:\

    \
  1. Type a term in one or more of the text boxes to filter the EST\ display. For example, to apply the filter to all ESTs expressed in a specific\ organ, type the name of the organ in the tissue box. To view the list of \ valid terms for each text box, consult the table in the Table Browser that \ corresponds to the factor on which you wish to filter. For example, the \ "tissue" table contains all the types of tissues that can be \ entered into the tissue text box. Wildcards may also be used in the\ filter.\
  2. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only ESTs that match all filter \ criteria will be highlighted. If "or" is selected, ESTs that \ match any one of the filter criteria will be highlighted.\
  3. Choose the color or display characteristic that should be used to \ highlight or include/exclude the filtered items. If "exclude" is \ chosen, the browser will not display ESTs that match the filter criteria. \ If "include" is selected, the browser will display only those \ ESTs that match the filter criteria.\

\

\ This track may also be configured to display base labeling, a feature that\ allows the user to display all bases in the aligning sequence or only those \ that differ from the genomic sequence. For more information about this option,\ click \ here.\

\ \

Methods

\

\ To make an EST, RNA is isolated from cells and reverse\ transcribed into cDNA. Typically, the cDNA is cloned\ into a plasmid vector and a read is taken from the 5'\ and/or 3' primer. For most — but not all — ESTs, the\ reverse transcription is primed by an oligo-dT, which\ hybridizes with the poly-A tail of mature mRNA. The\ reverse transcriptase may or may not make it to the 5'\ end of the mRNA, which may or may not be degraded.

\

\ In general, the 3' ESTs mark the end of transcription\ reasonably well, but the 5' ESTs may end at any point\ within the transcript. Some of the newer cap-selected\ libraries cover transcription start reasonably well. Before the \ cap-selection techniques\ emerged, some projects used random rather than poly-A\ priming in an attempt to retrieve sequence distant from the\ 3' end. These projects were successful at this, but as\ a side effect also deposited sequences from unprocessed\ mRNA and perhaps even genomic sequences into the EST databases.\ Even outside of the random-primed projects, there is a\ degree of non-mRNA contamination. Because of this, a\ single unspliced EST should be viewed with considerable\ skepticism.

\

\ To generate this track, C. intestinalis ESTs from GenBank were aligned \ against the genome using blat. Note that the maximum intron length\ allowed by blat is 750,000 bases, which may eliminate some ESTs with very \ long introns that might otherwise align. When a single \ EST aligned in multiple places, the alignment having the \ highest base identity was identified. Only alignments having\ a base identity level within 0.5% of the best and at least 96% base identity \ with the genomic sequence are displayed in this track.

\ \

Credits

\

\ This track was produced at UCSC from EST sequence data\ submitted to the international public sequence databases by \ scientists worldwide.

\ \

References

\

\ Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, \ Wheeler DL. \ GenBank: update. Nucleic Acids Res. \ 2004 Jan 1;32(Database issue):D23-6.

\

\ Kent WJ.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.

\ \ rna 1 baseColorUseSequence genbank\ group rna\ indelDoubleInsert on\ indelQueryInsert on\ intronGap 30\ longLabel C. intestinalis ESTs That Have Been Spliced\ priority 56\ shortLabel Spliced ESTs\ showDiffBasesAllScales .\ spectrum on\ track intronEst\ type psl est\ visibility dense\ est C. intestinalis ESTs psl est C. intestinalis ESTs Including Unspliced 0 100 0 0 0 127 127 127 1 0 0

Description

\ \

\ This track shows alignments between C. intestinalis expressed sequence tags\ (ESTs) in \ GenBank and the genome. ESTs are single-read sequences,\ typically about 500 bases in length, that usually represent fragments of\ transcribed genes.\

\ \

Display Conventions and Configuration

\ \

\ This track follows the display conventions for\ \ PSL alignment tracks. In dense display mode, the items that\ are more darkly shaded indicate matches of better quality.\

\ \

\ The strand information (+/-) indicates the\ direction of the match between the EST and the matching\ genomic sequence. It bears no relationship to the direction\ of transcription of the RNA with which it might be associated.\

\ \

\ The description page for this track has a filter that can be used to change\ the display mode, alter the color, and include/exclude a subset of items\ within the track. This may be helpful when many items are shown in the track\ display, especially when only some are relevant to the current task.\

\ \

\ To use the filter:\

    \
  1. Type a term in one or more of the text boxes to filter the EST\ display. For example, to apply the filter to all ESTs expressed in a specific\ organ, type the name of the organ in the tissue box. To view the list of\ valid terms for each text box, consult the table in the Table Browser that\ corresponds to the factor on which you wish to filter. For example, the\ "tissue" table contains all the types of tissues that can be\ entered into the tissue text box. Multiple terms may be entered at once,\ separated by a space. Wildcards may also be used in the filter.
  2. \
  3. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only ESTs that match all filter\ criteria will be highlighted. If "or" is selected, ESTs that\ match any one of the filter criteria will be highlighted.
  4. \
  5. Choose the color or display characteristic that should be used to\ highlight or include/exclude the filtered items. If "exclude" is\ chosen, the browser will not display ESTs that match the filter criteria.\ If "include" is selected, the browser will display only those\ ESTs that match the filter criteria.
  6. \
\

\ \

\ This track may also be configured to display base labeling, a feature that\ allows the user to display all bases in the aligning sequence or only those\ that differ from the genomic sequence. For more information about this option,\ go to the\ \ Base Coloring for Alignment Tracks page.\ Several types of alignment gap may also be colored;\ for more information, go to the\ \ Alignment Insertion/Deletion Display Options page.\

\ \

Methods

\ \

\ To make an EST, RNA is isolated from cells and reverse\ transcribed into cDNA. Typically, the cDNA is cloned\ into a plasmid vector and a read is taken from the 5'\ and/or 3' primer. For most — but not all — ESTs, the\ reverse transcription is primed by an oligo-dT, which\ hybridizes with the poly-A tail of mature mRNA. The\ reverse transcriptase may or may not make it to the 5'\ end of the mRNA, which may or may not be degraded.\

\ \

\ In general, the 3' ESTs mark the end of transcription\ reasonably well, but the 5' ESTs may end at any point\ within the transcript. Some of the newer cap-selected\ libraries cover transcription start reasonably well. Before the\ cap-selection techniques\ emerged, some projects used random rather than poly-A\ priming in an attempt to retrieve sequence distant from the\ 3' end. These projects were successful at this, but as\ a side effect also deposited sequences from unprocessed\ mRNA and perhaps even genomic sequences into the EST databases.\ Even outside of the random-primed projects, there is a\ degree of non-mRNA contamination. Because of this, a\ single unspliced EST should be viewed with considerable\ skepticism.\

\ \

\ To generate this track, C. intestinalis ESTs from GenBank were aligned\ against the genome using blat. Note that the maximum intron length\ allowed by blat is 750,000 bases, which may eliminate some ESTs with very\ long introns that might otherwise align. When a single\ EST aligned in multiple places, the alignment having the\ highest base identity was identified. Only alignments having\ a base identity level within 0.5% of the best and at least 96% base identity\ with the genomic sequence were kept.\

\ \

Credits

\ \

\ This track was produced at UCSC from EST sequence data\ submitted to the international public sequence databases by\ scientists worldwide.\

\ \

References

\

\ Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW.\ \ GenBank.\ Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42.\ PMID: 23193287; PMC: PMC3531190\

\ \

\ Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL.\ GenBank: update.\ Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6.\ PMID: 14681350; PMC: PMC308779\

\ \

\ Kent WJ.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.\ PMID: 11932250; PMC: PMC187518\

\ rna 1 baseColorUseSequence genbank\ group rna\ indelDoubleInsert on\ indelQueryInsert on\ intronGap 30\ longLabel C. intestinalis ESTs Including Unspliced\ maxItems 300\ shortLabel C. intestinalis ESTs\ spectrum on\ table all_est\ track est\ type psl est\ visibility hide\ mrna C. intestinalis mRNAs psl . C. intestinalis mRNAs from GenBank 3 100 0 0 0 127 127 127 1 0 0

Description

\ \

\ The mRNA track shows alignments between C. intestinalis mRNAs\ in \ GenBank and the genome.

\ \

Display Conventions and Configuration

\ \

\ This track follows the display conventions for\ \ PSL alignment tracks. In dense display mode, the items that\ are more darkly shaded indicate matches of better quality.\

\ \

\ The description page for this track has a filter that can be used to change\ the display mode, alter the color, and include/exclude a subset of items\ within the track. This may be helpful when many items are shown in the track\ display, especially when only some are relevant to the current task.\

\ \

\ To use the filter:\

    \
  1. Type a term in one or more of the text boxes to filter the mRNA\ display. For example, to apply the filter to all mRNAs expressed in a specific\ organ, type the name of the organ in the tissue box. To view the list of\ valid terms for each text box, consult the table in the Table Browser that\ corresponds to the factor on which you wish to filter. For example, the\ "tissue" table contains all the types of tissues that can be\ entered into the tissue text box. Multiple terms may be entered at once,\ separated by a space. Wildcards may also be used in the filter.
  2. \
  3. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only mRNAs that match all filter\ criteria will be highlighted. If "or" is selected, mRNAs that\ match any one of the filter criteria will be highlighted.
  4. \
  5. Choose the color or display characteristic that should be used to\ highlight or include/exclude the filtered items. If "exclude" is\ chosen, the browser will not display mRNAs that match the filter criteria.\ If "include" is selected, the browser will display only those\ mRNAs that match the filter criteria.
  6. \
\

\ \

\ This track may also be configured to display codon coloring, a feature that\ allows the user to quickly compare mRNAs against the genomic sequence. For more\ information about this option, go to the\ \ Codon and Base Coloring for Alignment Tracks page.\ Several types of alignment gap may also be colored;\ for more information, go to the\ \ Alignment Insertion/Deletion Display Options page.\

\ \

Methods

\ \

\ GenBank C. intestinalis mRNAs were aligned against the genome using the\ blat program. When a single mRNA aligned in multiple places,\ the alignment having the highest base identity was found.\ Only alignments having a base identity level within 0.5% of\ the best and at least 96% base identity with the genomic sequence were kept.\

\ \

Credits

\ \

\ The mRNA track was produced at UCSC from mRNA sequence data\ submitted to the international public sequence databases by\ scientists worldwide.\

\ \

References

\

\ Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW.\ \ GenBank.\ Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42.\ PMID: 23193287; PMC: PMC3531190\

\ \

\ Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL.\ GenBank: update.\ Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6.\ PMID: 14681350; PMC: PMC308779\

\ \

\ Kent WJ.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.\ PMID: 11932250; PMC: PMC187518\

\ rna 1 baseColorDefault diffCodons\ baseColorUseCds genbank\ baseColorUseSequence genbank\ group rna\ indelDoubleInsert on\ indelPolyA on\ indelQueryInsert on\ longLabel C. intestinalis mRNAs from GenBank\ shortLabel C. intestinalis mRNAs\ showDiffBasesAllScales .\ spectrum on\ table all_mrna\ track mrna\ type psl .\ visibility pack\ crispr CRISPR bed 3 CRISPR/Cas9 Sp. Pyog. target sites 0 100 0 0 0 127 127 127 0 0 0

Description

\ \

\ This track shows regions of the genome within 200 bp of transcribed regions and\ DNA sequences targetable by CRISPR RNA guides using the Cas9 enzyme\ from S. pyogenes (PAM: NGG).\ CRISPR target sites were annotated with predicted specificity\ (off-target effects) and predicted efficiency (on-target cleavage) by various\ algorithms through the tool CRISPOR.\

\ \

Display Conventions and Configuration

\ \

\ The track "CRISPR Regions" shows the regions of the genome where target\ sites were analyzed, i.e. within 200 bp of transcribed regions as annotated by\ Ensembl transcript models.

\ \

\ The track "CRISPR Targets" shows the target sites in these regions.\ The target sequence of the guide is shown with a thick (exon) bar. The PAM\ motif match (NGG) is shown with a thinner bar. Guides\ are colored to reflect both predicted specificity and efficiency. Specificity\ reflects the "uniqueness" of a 20mer sequence in the genome; the less unique a\ sequence is, the more likely it is to cleave other locations of the genome\ (off-target effects). Efficiency is the frequency of cleavage at the target\ site (on-target efficiency).

\ \

Shades of gray stand for sites that are hard to target specifically, as the\ 20mer is not very unique in the genome:

\ \ \ \ \
impossible to target: target site has at least one identical copy in the genome and was not scored
hard to target: many similar sequences in the genome that alignment stopped, repeat?
hard to target: target site was aligned but results in a low specificity score <= 50 (see below)
\ \

Colors highlight targets that are specific in the genome (MIT specificity > 50) but have different predicted efficiencies:

\ \ \ \ \ \
unable to calculate Doench/Fusi 2016 efficiency score
low predicted cleavage: Doench/Fusi 2016 Efficiency percentile <= 30
medium predicted cleavage: Doench/Fusi 2016 Efficiency percentile > 30 and < 55
high predicted cleavage: Doench/Fusi 2016 Efficiency > 55

\ \

\ Mouse-over a target site to show predicted specificity and efficiency scores:
\

    \
  1. The MIT Specificity score summarizes all off-targets into a single number from\ 0-100. The higher the number, the fewer off-target effects are expected. We\ recommend guides with an MIT specificity > 50.
  2. \
  3. The efficiency score tries to predict if a guide leads to rather strong or\ weak cleavage. According to (Haeussler et al. 2016), the Doench\ 2016 Efficiency score should be used to select the guide with the highest\ cleavage efficiency when expressing guides from RNA PolIII Promoters such as\ U6. Scores are given as percentiles, e.g. "70%" means that 70% of mammalian\ guides have a score equal or lower than this guide. The raw score number is\ also shown in parentheses after the percentile.
  4. \
  5. The Moreno-Mateos 2015 Efficiency\ score should be used instead of the Doench 2016 score when transcribing the\ guide in vitro with a T7 promoter, e.g. for injections in mouse, zebrafish or\ Xenopus embryos. The Moreno-Mateos score is given in percentiles and the raw value in parentheses, see the note above.
\

\ \

Click onto features to show all scores and predicted off-targets with up to\ four mismatches. The Out-of-Frame score by Bae et al. 2014\ is correlated with\ the probability that mutations induced by the guide RNA will disrupt the open\ reading frame. The authors recommend out-of-frame scores > 66 to create\ knock-outs with a single guide efficiently.

\ \

Off-target sites are sorted by the CFD (Cutting Frequency Determination) \ score (Doench et al. 2016). \ The higher the CFD score, the more likely there is off-target cleavage at that site. \ Off-targets with a CFD score < 0.023 are not shown on this page, but are availble when \ following the link to the external CRISPOR tool. \ When compared against experimentally validated off-targets by \ Haeussler et al. 2016, the large majority of predicted\ off-targets with CFD scores < 0.023 were false-positives.

\ \

Methods

\ \

Relationship between predictions and experimental data

\ \

\ Like most algorithms, the MIT specificity score is not always a perfect\ predictor of off-target effects. Despite low scores, many tested guides \ caused few and/or weak off-target cleavage when tested with whole-genome assays\ (Figure 2 from Haeussler\ et al. 2016), as shown below, and the published data contains few data points\ with high specificity scores. Overall though, the assays showed that the higher\ the specificity score, the lower the off-target effects.

\ \ \ \

Similarly, efficiency scoring is not very accurate: guides with low\ scores can be efficient and vice versa. As a general rule, however, the higher\ the score, the less likely that a guide is very inefficient. The\ following histograms illustrate, for each type of score, how the share of\ inefficient guides drops with increasing efficiency scores:\

\ \ \ \

When reading this plot, keep in mind that both scores were evaluated on\ their own training data. Especially for the Moreno-Mateos score, the\ results are too optimistic, due to overfitting. When evaluated on independent\ datasets, the correlation of the prediction with other assays was around 25%\ lower, see Haeussler et al. 2016. At the time of\ writing, there is no independent dataset available yet to determine the\ Moreno-Mateos accuracy for each score percentile range.

\ \

Track methods

\

\ Exons as predicted by Ensembl Gene models were used, extended by 200 basepairs\ on each side, searched for the -NGG motif. Flanking 20mer guide sequences were\ aligned to the genome with BWA and scored with MIT Specificity scores using the\ command-line version of crispor.org. Non-unique guide sequences were skipped.\ Flanking sequences were extracted from the genome and input for Crispor\ efficiency scoring, available from the Crispor downloads page, which\ includes the Doench 2016, Moreno-Mateos 2015 and Bae\ 2014 algorithms, among others.\

\ \

Data Access

\

\ The raw data can be explored interactively with the Table Browser.\ For automated analysis, the genome annotation is stored in a bigBed file that\ can be downloaded from\ our download server.\ The files for this track are called crispr.bb and crisprDetails.tab and are located in the /gbdb/ci2/crispr directory of our downloads server. Individual\ regions or the whole genome annotation can be obtained using our tool bigBedToBed,\ which can be compiled from the source code or downloaded as a precompiled\ binary for your system. Instructions for downloading source code and binaries can be found\ here. The tool\ can also be used to obtain only features within a given range, e.g. bigBedToBed\ http://hgdownload.soe.ucsc.edu/gbdb/hg19/crispr/crispr.bb -chrom=chr21\ -start=0 -end=10000000 stdout

\ \

\ The file crisprDetails.tab includes the details of the off-targets. The last\ column of the bigBed file is the offset of the respective line in\ crisprDetails.tab. E.g. if the last column is 14227033723, then the following\ command will extract the line with the corresponding off-target details:\ curl -s -r 14227033723-14227043723 http://hgdownload.soe.ucsc.edu/gbdb/hg19/crispr/crisprDetails.tab | head -n1. The off-target details can currently not be joined with the table\ browser.

\ \

\ The file crisprDetails.tab is a tab-separated text file with two fields. The\ first field contains the numbers of off-targets for each mismatch, e.g. "0,0,1,3,49" \ means 0 off-targets at zero mismatches, 1 at two mismatches, 3 at three and 49\ off-targets at four mismatches. The second field is a pipe-separated list of\ semicolon-separated tuples with the genome coordinates and the CFD score. E.g.\ "chr10;123376795+;42|chr5;148353274-;39" describes two off-targets, with the\ first at chr1:123376795 on the positive strand and a CFD score 0.42

\ \

Credits

\ \

\ Track created by Maximilian Haeussler and Hiram Clawson, with helpful input from Jean-Paul Concordet (MNHN Paris) and Alberto Stolfi (NYU).\

\ \

References

\ \

\ Haeussler M, Schönig K, Eckert H, Eschstruth A, Mianné J, Renaud JB, Schneider-Maunoury S,\ Shkumatava A, Teboul L, Kent J et al.\ Evaluation of off-target and on-target scoring algorithms and integration into the\ guide RNA selection tool CRISPOR.\ Genome Biol. 2016 Jul 5;17(1):148.\ PMID: 27380939; PMC: PMC4934014\

\ \

\ Bae S, Kweon J, Kim HS, Kim JS.\ \ Microhomology-based choice of Cas9 nuclease target sites.\ Nat Methods. 2014 Jul;11(7):705-6.\ PMID: 24972169\

\ \

\ Doench JG, Fusi N, Sullender M, Hegde M, Vaimberg EW, Donovan KF, Smith I, Tothova Z, Wilen C,\ Orchard R et al.\ \ Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9.\ Nat Biotechnol. 2016 Feb;34(2):184-91.\ PMID: 26780180; PMC: PMC4744125\

\ \

\ Hsu PD, Scott DA, Weinstein JA, Ran FA, Konermann S, Agarwala V, Li Y, Fine EJ, Wu X, Shalem O\ et al.\ \ DNA targeting specificity of RNA-guided Cas9 nucleases.\ Nat Biotechnol. 2013 Sep;31(9):827-32.\ PMID: 23873081; PMC: PMC3969858\

\ \

\ Moreno-Mateos MA, Vejnar CE, Beaudoin JD, Fernandez JP, Mis EK, Khokha MK, Giraldez AJ.\ \ CRISPRscan: designing highly efficient sgRNAs for CRISPR-Cas9 targeting in vivo.\ Nat Methods. 2015 Oct;12(10):982-8.\ PMID: 26322839; PMC: PMC4589495\

\ genes 1 group genes\ html crispr\ longLabel CRISPR/Cas9 Sp. Pyog. target sites\ shortLabel CRISPR\ superTrack on\ track crispr\ type bed 3\ visibility hide\ crisprRanges CRISPR Regions bed 3 Genome regions processed to find CRISPR/Cas9 target sites (exons +/- 200 bp) 1 100 110 110 110 182 182 182 0 0 0

Description

\ \

\ This track shows regions of the genome within 200 bp of transcribed regions and\ DNA sequences targetable by CRISPR RNA guides using the Cas9 enzyme\ from S. pyogenes (PAM: NGG).\ CRISPR target sites were annotated with predicted specificity\ (off-target effects) and predicted efficiency (on-target cleavage) by various\ algorithms through the tool CRISPOR.\

\ \

Display Conventions and Configuration

\ \

\ The track "CRISPR Regions" shows the regions of the genome where target\ sites were analyzed, i.e. within 200 bp of transcribed regions as annotated by\ Ensembl transcript models.

\ \

\ The track "CRISPR Targets" shows the target sites in these regions.\ The target sequence of the guide is shown with a thick (exon) bar. The PAM\ motif match (NGG) is shown with a thinner bar. Guides\ are colored to reflect both predicted specificity and efficiency. Specificity\ reflects the "uniqueness" of a 20mer sequence in the genome; the less unique a\ sequence is, the more likely it is to cleave other locations of the genome\ (off-target effects). Efficiency is the frequency of cleavage at the target\ site (on-target efficiency).

\ \

Shades of gray stand for sites that are hard to target specifically, as the\ 20mer is not very unique in the genome:

\ \ \ \ \
impossible to target: target site has at least one identical copy in the genome and was not scored
hard to target: many similar sequences in the genome that alignment stopped, repeat?
hard to target: target site was aligned but results in a low specificity score <= 50 (see below)
\ \

Colors highlight targets that are specific in the genome (MIT specificity > 50) but have different predicted efficiencies:

\ \ \ \ \ \
unable to calculate Doench/Fusi 2016 efficiency score
low predicted cleavage: Doench/Fusi 2016 Efficiency percentile <= 30
medium predicted cleavage: Doench/Fusi 2016 Efficiency percentile > 30 and < 55
high predicted cleavage: Doench/Fusi 2016 Efficiency > 55

\ \

\ Mouse-over a target site to show predicted specificity and efficiency scores:
\

    \
  1. The MIT Specificity score summarizes all off-targets into a single number from\ 0-100. The higher the number, the fewer off-target effects are expected. We\ recommend guides with an MIT specificity > 50.
  2. \
  3. The efficiency score tries to predict if a guide leads to rather strong or\ weak cleavage. According to (Haeussler et al. 2016), the Doench\ 2016 Efficiency score should be used to select the guide with the highest\ cleavage efficiency when expressing guides from RNA PolIII Promoters such as\ U6. Scores are given as percentiles, e.g. "70%" means that 70% of mammalian\ guides have a score equal or lower than this guide. The raw score number is\ also shown in parentheses after the percentile.
  4. \
  5. The Moreno-Mateos 2015 Efficiency\ score should be used instead of the Doench 2016 score when transcribing the\ guide in vitro with a T7 promoter, e.g. for injections in mouse, zebrafish or\ Xenopus embryos. The Moreno-Mateos score is given in percentiles and the raw value in parentheses, see the note above.
\

\ \

Click onto features to show all scores and predicted off-targets with up to\ four mismatches. The Out-of-Frame score by Bae et al. 2014\ is correlated with\ the probability that mutations induced by the guide RNA will disrupt the open\ reading frame. The authors recommend out-of-frame scores > 66 to create\ knock-outs with a single guide efficiently.

\ \

Off-target sites are sorted by the CFD (Cutting Frequency Determination) \ score (Doench et al. 2016). \ The higher the CFD score, the more likely there is off-target cleavage at that site. \ Off-targets with a CFD score < 0.023 are not shown on this page, but are availble when \ following the link to the external CRISPOR tool. \ When compared against experimentally validated off-targets by \ Haeussler et al. 2016, the large majority of predicted\ off-targets with CFD scores < 0.023 were false-positives.

\ \

Methods

\ \

Relationship between predictions and experimental data

\ \

\ Like most algorithms, the MIT specificity score is not always a perfect\ predictor of off-target effects. Despite low scores, many tested guides \ caused few and/or weak off-target cleavage when tested with whole-genome assays\ (Figure 2 from Haeussler\ et al. 2016), as shown below, and the published data contains few data points\ with high specificity scores. Overall though, the assays showed that the higher\ the specificity score, the lower the off-target effects.

\ \ \ \

Similarly, efficiency scoring is not very accurate: guides with low\ scores can be efficient and vice versa. As a general rule, however, the higher\ the score, the less likely that a guide is very inefficient. The\ following histograms illustrate, for each type of score, how the share of\ inefficient guides drops with increasing efficiency scores:\

\ \ \ \

When reading this plot, keep in mind that both scores were evaluated on\ their own training data. Especially for the Moreno-Mateos score, the\ results are too optimistic, due to overfitting. When evaluated on independent\ datasets, the correlation of the prediction with other assays was around 25%\ lower, see Haeussler et al. 2016. At the time of\ writing, there is no independent dataset available yet to determine the\ Moreno-Mateos accuracy for each score percentile range.

\ \

Track methods

\

\ Exons as predicted by Ensembl Gene models were used, extended by 200 basepairs\ on each side, searched for the -NGG motif. Flanking 20mer guide sequences were\ aligned to the genome with BWA and scored with MIT Specificity scores using the\ command-line version of crispor.org. Non-unique guide sequences were skipped.\ Flanking sequences were extracted from the genome and input for Crispor\ efficiency scoring, available from the Crispor downloads page, which\ includes the Doench 2016, Moreno-Mateos 2015 and Bae\ 2014 algorithms, among others.\

\ \

Data Access

\

\ The raw data can be explored interactively with the Table Browser.\ For automated analysis, the genome annotation is stored in a bigBed file that\ can be downloaded from\ our download server.\ The files for this track are called crispr.bb and crisprDetails.tab and are located in the /gbdb/ci2/crispr directory of our downloads server. Individual\ regions or the whole genome annotation can be obtained using our tool bigBedToBed,\ which can be compiled from the source code or downloaded as a precompiled\ binary for your system. Instructions for downloading source code and binaries can be found\ here. The tool\ can also be used to obtain only features within a given range, e.g. bigBedToBed\ http://hgdownload.soe.ucsc.edu/gbdb/hg19/crisprRanges/crispr.bb -chrom=chr21\ -start=0 -end=10000000 stdout

\ \

\ The file crisprDetails.tab includes the details of the off-targets. The last\ column of the bigBed file is the offset of the respective line in\ crisprDetails.tab. E.g. if the last column is 14227033723, then the following\ command will extract the line with the corresponding off-target details:\ curl -s -r 14227033723-14227043723 http://hgdownload.soe.ucsc.edu/gbdb/hg19/crispr/crisprDetails.tab | head -n1. The off-target details can currently not be joined with the table\ browser.

\ \

\ The file crisprDetails.tab is a tab-separated text file with two fields. The\ first field contains the numbers of off-targets for each mismatch, e.g. "0,0,1,3,49" \ means 0 off-targets at zero mismatches, 1 at two mismatches, 3 at three and 49\ off-targets at four mismatches. The second field is a pipe-separated list of\ semicolon-separated tuples with the genome coordinates and the CFD score. E.g.\ "chr10;123376795+;42|chr5;148353274-;39" describes two off-targets, with the\ first at chr1:123376795 on the positive strand and a CFD score 0.42

\ \

Credits

\ \

\ Track created by Maximilian Haeussler and Hiram Clawson, with helpful input from Jean-Paul Concordet (MNHN Paris) and Alberto Stolfi (NYU).\

\ \

References

\ \

\ Haeussler M, Schönig K, Eckert H, Eschstruth A, Mianné J, Renaud JB, Schneider-Maunoury S,\ Shkumatava A, Teboul L, Kent J et al.\ Evaluation of off-target and on-target scoring algorithms and integration into the\ guide RNA selection tool CRISPOR.\ Genome Biol. 2016 Jul 5;17(1):148.\ PMID: 27380939; PMC: PMC4934014\

\ \

\ Bae S, Kweon J, Kim HS, Kim JS.\ \ Microhomology-based choice of Cas9 nuclease target sites.\ Nat Methods. 2014 Jul;11(7):705-6.\ PMID: 24972169\

\ \

\ Doench JG, Fusi N, Sullender M, Hegde M, Vaimberg EW, Donovan KF, Smith I, Tothova Z, Wilen C,\ Orchard R et al.\ \ Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9.\ Nat Biotechnol. 2016 Feb;34(2):184-91.\ PMID: 26780180; PMC: PMC4744125\

\ \

\ Hsu PD, Scott DA, Weinstein JA, Ran FA, Konermann S, Agarwala V, Li Y, Fine EJ, Wu X, Shalem O\ et al.\ \ DNA targeting specificity of RNA-guided Cas9 nucleases.\ Nat Biotechnol. 2013 Sep;31(9):827-32.\ PMID: 23873081; PMC: PMC3969858\

\ \

\ Moreno-Mateos MA, Vejnar CE, Beaudoin JD, Fernandez JP, Mis EK, Khokha MK, Giraldez AJ.\ \ CRISPRscan: designing highly efficient sgRNAs for CRISPR-Cas9 targeting in vivo.\ Nat Methods. 2015 Oct;12(10):982-8.\ PMID: 26322839; PMC: PMC4589495\

\ genes 1 color 110,110,110\ html crispr\ longLabel Genome regions processed to find CRISPR/Cas9 target sites (exons +/- 200 bp)\ parent crispr\ shortLabel CRISPR Regions\ track crisprRanges\ type bed 3\ visibility dense\ crisprTargets CRISPR Targets bigBed 9 + CRISPR/Cas9 -NGG Targets 1 100 0 0 0 127 127 127 0 0 0 http://crispor.tefor.net/crispor.py?org=$D&pos=$S:${&pam=NGG

Description

\ \

\ This track shows regions of the genome within 200 bp of transcribed regions and\ DNA sequences targetable by CRISPR RNA guides using the Cas9 enzyme\ from S. pyogenes (PAM: NGG).\ CRISPR target sites were annotated with predicted specificity\ (off-target effects) and predicted efficiency (on-target cleavage) by various\ algorithms through the tool CRISPOR.\

\ \

Display Conventions and Configuration

\ \

\ The track "CRISPR Regions" shows the regions of the genome where target\ sites were analyzed, i.e. within 200 bp of transcribed regions as annotated by\ Ensembl transcript models.

\ \

\ The track "CRISPR Targets" shows the target sites in these regions.\ The target sequence of the guide is shown with a thick (exon) bar. The PAM\ motif match (NGG) is shown with a thinner bar. Guides\ are colored to reflect both predicted specificity and efficiency. Specificity\ reflects the "uniqueness" of a 20mer sequence in the genome; the less unique a\ sequence is, the more likely it is to cleave other locations of the genome\ (off-target effects). Efficiency is the frequency of cleavage at the target\ site (on-target efficiency).

\ \

Shades of gray stand for sites that are hard to target specifically, as the\ 20mer is not very unique in the genome:

\ \ \ \ \
impossible to target: target site has at least one identical copy in the genome and was not scored
hard to target: many similar sequences in the genome that alignment stopped, repeat?
hard to target: target site was aligned but results in a low specificity score <= 50 (see below)
\ \

Colors highlight targets that are specific in the genome (MIT specificity > 50) but have different predicted efficiencies:

\ \ \ \ \ \
unable to calculate Doench/Fusi 2016 efficiency score
low predicted cleavage: Doench/Fusi 2016 Efficiency percentile <= 30
medium predicted cleavage: Doench/Fusi 2016 Efficiency percentile > 30 and < 55
high predicted cleavage: Doench/Fusi 2016 Efficiency > 55

\ \

\ Mouse-over a target site to show predicted specificity and efficiency scores:
\

    \
  1. The MIT Specificity score summarizes all off-targets into a single number from\ 0-100. The higher the number, the fewer off-target effects are expected. We\ recommend guides with an MIT specificity > 50.
  2. \
  3. The efficiency score tries to predict if a guide leads to rather strong or\ weak cleavage. According to (Haeussler et al. 2016), the Doench\ 2016 Efficiency score should be used to select the guide with the highest\ cleavage efficiency when expressing guides from RNA PolIII Promoters such as\ U6. Scores are given as percentiles, e.g. "70%" means that 70% of mammalian\ guides have a score equal or lower than this guide. The raw score number is\ also shown in parentheses after the percentile.
  4. \
  5. The Moreno-Mateos 2015 Efficiency\ score should be used instead of the Doench 2016 score when transcribing the\ guide in vitro with a T7 promoter, e.g. for injections in mouse, zebrafish or\ Xenopus embryos. The Moreno-Mateos score is given in percentiles and the raw value in parentheses, see the note above.
\

\ \

Click onto features to show all scores and predicted off-targets with up to\ four mismatches. The Out-of-Frame score by Bae et al. 2014\ is correlated with\ the probability that mutations induced by the guide RNA will disrupt the open\ reading frame. The authors recommend out-of-frame scores > 66 to create\ knock-outs with a single guide efficiently.

\ \

Off-target sites are sorted by the CFD (Cutting Frequency Determination) \ score (Doench et al. 2016). \ The higher the CFD score, the more likely there is off-target cleavage at that site. \ Off-targets with a CFD score < 0.023 are not shown on this page, but are availble when \ following the link to the external CRISPOR tool. \ When compared against experimentally validated off-targets by \ Haeussler et al. 2016, the large majority of predicted\ off-targets with CFD scores < 0.023 were false-positives.

\ \

Methods

\ \

Relationship between predictions and experimental data

\ \

\ Like most algorithms, the MIT specificity score is not always a perfect\ predictor of off-target effects. Despite low scores, many tested guides \ caused few and/or weak off-target cleavage when tested with whole-genome assays\ (Figure 2 from Haeussler\ et al. 2016), as shown below, and the published data contains few data points\ with high specificity scores. Overall though, the assays showed that the higher\ the specificity score, the lower the off-target effects.

\ \ \ \

Similarly, efficiency scoring is not very accurate: guides with low\ scores can be efficient and vice versa. As a general rule, however, the higher\ the score, the less likely that a guide is very inefficient. The\ following histograms illustrate, for each type of score, how the share of\ inefficient guides drops with increasing efficiency scores:\

\ \ \ \

When reading this plot, keep in mind that both scores were evaluated on\ their own training data. Especially for the Moreno-Mateos score, the\ results are too optimistic, due to overfitting. When evaluated on independent\ datasets, the correlation of the prediction with other assays was around 25%\ lower, see Haeussler et al. 2016. At the time of\ writing, there is no independent dataset available yet to determine the\ Moreno-Mateos accuracy for each score percentile range.

\ \

Track methods

\

\ Exons as predicted by Ensembl Gene models were used, extended by 200 basepairs\ on each side, searched for the -NGG motif. Flanking 20mer guide sequences were\ aligned to the genome with BWA and scored with MIT Specificity scores using the\ command-line version of crispor.org. Non-unique guide sequences were skipped.\ Flanking sequences were extracted from the genome and input for Crispor\ efficiency scoring, available from the Crispor downloads page, which\ includes the Doench 2016, Moreno-Mateos 2015 and Bae\ 2014 algorithms, among others.\

\ \

Data Access

\

\ The raw data can be explored interactively with the Table Browser.\ For automated analysis, the genome annotation is stored in a bigBed file that\ can be downloaded from\ our download server.\ The files for this track are called crispr.bb and crisprDetails.tab and are located in the /gbdb/ci2/crispr directory of our downloads server. Individual\ regions or the whole genome annotation can be obtained using our tool bigBedToBed,\ which can be compiled from the source code or downloaded as a precompiled\ binary for your system. Instructions for downloading source code and binaries can be found\ here. The tool\ can also be used to obtain only features within a given range, e.g. bigBedToBed\ http://hgdownload.soe.ucsc.edu/gbdb/hg19/crisprTargets/crispr.bb -chrom=chr21\ -start=0 -end=10000000 stdout

\ \

\ The file crisprDetails.tab includes the details of the off-targets. The last\ column of the bigBed file is the offset of the respective line in\ crisprDetails.tab. E.g. if the last column is 14227033723, then the following\ command will extract the line with the corresponding off-target details:\ curl -s -r 14227033723-14227043723 http://hgdownload.soe.ucsc.edu/gbdb/hg19/crispr/crisprDetails.tab | head -n1. The off-target details can currently not be joined with the table\ browser.

\ \

\ The file crisprDetails.tab is a tab-separated text file with two fields. The\ first field contains the numbers of off-targets for each mismatch, e.g. "0,0,1,3,49" \ means 0 off-targets at zero mismatches, 1 at two mismatches, 3 at three and 49\ off-targets at four mismatches. The second field is a pipe-separated list of\ semicolon-separated tuples with the genome coordinates and the CFD score. E.g.\ "chr10;123376795+;42|chr5;148353274-;39" describes two off-targets, with the\ first at chr1:123376795 on the positive strand and a CFD score 0.42

\ \

Credits

\ \

\ Track created by Maximilian Haeussler and Hiram Clawson, with helpful input from Jean-Paul Concordet (MNHN Paris) and Alberto Stolfi (NYU).\

\ \

References

\ \

\ Haeussler M, Schönig K, Eckert H, Eschstruth A, Mianné J, Renaud JB, Schneider-Maunoury S,\ Shkumatava A, Teboul L, Kent J et al.\ Evaluation of off-target and on-target scoring algorithms and integration into the\ guide RNA selection tool CRISPOR.\ Genome Biol. 2016 Jul 5;17(1):148.\ PMID: 27380939; PMC: PMC4934014\

\ \

\ Bae S, Kweon J, Kim HS, Kim JS.\ \ Microhomology-based choice of Cas9 nuclease target sites.\ Nat Methods. 2014 Jul;11(7):705-6.\ PMID: 24972169\

\ \

\ Doench JG, Fusi N, Sullender M, Hegde M, Vaimberg EW, Donovan KF, Smith I, Tothova Z, Wilen C,\ Orchard R et al.\ \ Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9.\ Nat Biotechnol. 2016 Feb;34(2):184-91.\ PMID: 26780180; PMC: PMC4744125\

\ \

\ Hsu PD, Scott DA, Weinstein JA, Ran FA, Konermann S, Agarwala V, Li Y, Fine EJ, Wu X, Shalem O\ et al.\ \ DNA targeting specificity of RNA-guided Cas9 nucleases.\ Nat Biotechnol. 2013 Sep;31(9):827-32.\ PMID: 23873081; PMC: PMC3969858\

\ \

\ Moreno-Mateos MA, Vejnar CE, Beaudoin JD, Fernandez JP, Mis EK, Khokha MK, Giraldez AJ.\ \ CRISPRscan: designing highly efficient sgRNAs for CRISPR-Cas9 targeting in vivo.\ Nat Methods. 2015 Oct;12(10):982-8.\ PMID: 26322839; PMC: PMC4589495\

\ genes 1 detailsTabUrls _offset=/gbdb/$db/crispr/crisprDetails.tab\ html crispr\ itemRgb on\ longLabel CRISPR/Cas9 -NGG Targets\ mouseOverField _mouseOver\ parent crispr\ scoreLabel MIT Guide Specificity Score\ shortLabel CRISPR Targets\ track crisprTargets\ type bigBed 9 +\ url http://crispor.tefor.net/crispor.py?org=$D&pos=$S:${&pam=NGG\ urlLabel Click here to show this guide on Crispor.org, with expression oligos, validation primers and more\ visibility dense\ ensGene Ensembl Genes genePred ensPep Ensembl Genes 0 100 150 0 0 202 127 127 0 0 0

Description

\ \

\ These gene predictions were generated by Ensembl.\

\ \

\ For more information on the different gene tracks, see our Genes FAQ.

\ \

Methods

\ \

\ For a description of the methods used in Ensembl gene predictions, please refer to\ Hubbard et al. (2002), also listed in the References section below. \

\ \

Data access

\

\ Ensembl Gene data can be explored interactively using the\ Table Browser or the\ Data Integrator. \ For local downloads, the genePred format files for ci2 are available in our\ \ downloads directory as ensGene.txt.gz or in our\ \ genes download directory in GTF format.

\ For programmatic access, the data can be queried from the \ REST API or\ directly from our public MySQL\ servers. Instructions on this method are available on our\ MySQL help page and on\ our blog.

\ \

\ Previous versions of this track can be found on our archive download server.\

\ \

Credits

\ \

\ We would like to thank Ensembl for providing these gene annotations. For more information, please see\ Ensembl's genome annotation page.\

\ \

References

\ \

\ Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J,\ Curwen V, Down T et al.\ The Ensembl genome database project.\ Nucleic Acids Res. 2002 Jan 1;30(1):38-41.\ PMID: 11752248; PMC: PMC99161\

\ genes 1 color 150,0,0\ exonNumbers on\ group genes\ longLabel Ensembl Genes\ shortLabel Ensembl Genes\ track ensGene\ type genePred ensPep\ visibility hide\ gap Gap bed 3 + Gap Locations 1 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track shows the position of gaps — represented by Ns — within \ the C. intestinalis assembly. Gaps of 50 or more bases were most \ likely introduced by the JGI JAZZ assembler.

\

\ For a discussion of gaps and the JAZZ assembler see \ Dehal, P. et al. (2002) in the References section below.

\ \

Display Conventions and Configuration

\

\ Gaps are represented by boxes. If the relative order and orientation of \ the contigs on either side of the gap is known from mRNA, ESTs, or paired BAC \ end reads, it is a bridged gap, indicated by a white line drawn \ through the box. The display must be sufficiently zoomed in to view this \ feature. In full display mode, the item label indicates the type of gap and \ whether the gap is bridged.

\ \

References

\

\ Murphy WJ, Eizirik E, O'Brien SJ, Madsen O, Scally M, Douady CJ, Teeling E, Ryder OA, Stanhope MJ,\ de Jong WW et al.\ The draft genome of Ciona intestinalis: insights into chordate and vertebrate origins.\ Science. 2002 Dec 13;298(5601):2157-67.\ PMID: 12481130\

\ map 1 group map\ longLabel Gap Locations\ shortLabel Gap\ track gap\ type bed 3 +\ visibility dense\ gc5Base GC Percent wig 0 100 GC Percent in 5-Base Windows 0 100 0 0 0 128 128 128 0 0 0

Description

\

\ The GC percent track shows the percentage of G (guanine) and C (cytosine) bases\ in 5-base windows. High GC content is typically associated with\ gene-rich areas.\

\

\ This track may be configured in a variety of ways to highlight different\ apsects of the displayed information. Click the\ "Graph configuration help"\ link for an explanation of the configuration options.\ \

Credits

\

The data and presentation of this graph were prepared by\ Hiram Clawson.\

\ \ map 0 altColor 128,128,128\ autoScale Off\ color 0,0,0\ graphTypeDefault Bar\ gridDefault OFF\ group map\ longLabel GC Percent in 5-Base Windows\ maxHeightPixels 128:36:16\ shortLabel GC Percent\ spanList 5\ track gc5Base\ type wig 0 100\ viewLimits 30:70\ visibility hide\ windowingFunction Mean\ microsat Microsatellite bed 4 Microsatellites - Di-nucleotide and Tri-nucleotide Repeats 0 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track displays regions that are likely to be useful as microsatellite\ markers. These are sequences of at least 15 perfect di-nucleotide and \ tri-nucleotide repeats and tend to be highly polymorphic in the\ population.\

\ \

Methods

\

\ The data shown in this track are a subset of the Simple Repeats track, \ selecting only those \ repeats of period 2 and 3, with 100% identity and no indels and with\ at least 15 copies of the repeat. The Simple Repeats track is\ created using the \ Tandem Repeats Finder. For more information about this \ program, see Benson (1999).

\ \

Credits

\

\ Tandem Repeats Finder was written by \ Gary Benson.

\ \

References

\ \

\ Benson G.\ \ Tandem repeats finder: a program to analyze DNA sequences.\ Nucleic Acids Res. 1999 Jan 15;27(2):573-80.\ PMID: 9862982; PMC: PMC148217\

\ varRep 1 group varRep\ longLabel Microsatellites - Di-nucleotide and Tri-nucleotide Repeats\ shortLabel Microsatellite\ track microsat\ type bed 4\ visibility hide\ xenoMrna Other mRNAs psl xeno Non-C. intestinalis mRNAs from GenBank 1 100 0 0 0 127 127 127 1 0 0

Description

\

\ This track displays translated blat alignments of vertebrate and\ invertebrate mRNA in \ GenBank from organisms other than C. intestinalis.\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ PSL alignment tracks. In dense display mode, the items that\ are more darkly shaded indicate matches of better quality.

\

\ The strand information (+/-) for this track is in two parts. The\ first + indicates the orientation of the query sequence whose\ translated protein produced the match (here always 5' to 3', hence +).\ The second + or - indicates the orientation of the matching \ translated genomic sequence. Because the two orientations of a DNA \ sequence give different predicted protein sequences, there are four \ combinations. ++ is not the same as --, nor is +- the same as -+.

\

\ The description page for this track has a filter that can be used to change \ the display mode, alter the color, and include/exclude a subset of items \ within the track. This may be helpful when many items are shown in the track \ display, especially when only some are relevant to the current task.

\

\ To use the filter:\

    \
  1. Type a term in one or more of the text boxes to filter the mRNA \ display. For example, to apply the filter to all mRNAs expressed in a specific\ organ, type the name of the organ in the tissue box. To view the list of \ valid terms for each text box, consult the table in the Table Browser that \ corresponds to the factor on which you wish to filter. For example, the \ "tissue" table contains all the types of tissues that can be \ entered into the tissue text box. Wildcards may also be used in the\ filter.\
  2. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only mRNAs that match all filter \ criteria will be highlighted. If "or" is selected, mRNAs that \ match any one of the filter criteria will be highlighted.\
  3. Choose the color or display characteristic that should be used to \ highlight or include/exclude the filtered items. If "exclude" is \ chosen, the browser will not display mRNAs that match the filter criteria. \ If "include" is selected, the browser will display only those \ mRNAs that match the filter criteria.\

\

\ This track may also be configured to display codon coloring, a feature that\ allows the user to quickly compare mRNAs against the genomic sequence. For more \ information about this option, click \ here.\

\ \

Methods

\

\ The mRNAs were aligned against the C. intestinalis genome using translated \ blat. When a single mRNA aligned in multiple places, the alignment having the\ highest base identity was found. Only those alignments having a base \ identity level within 1% of the best and at least 25% base identity with the \ genomic sequence were kept.

\ \

Credits

\

\ The mRNA track was produced at UCSC from mRNA sequence data\ submitted to the international public sequence databases by \ scientists worldwide.

\ \

References

\

\ Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, \ Wheeler DL. \ GenBank: update. Nucleic Acids Res. \ 2004 Jan 1;32(Database issue):D23-6.

\

\ Kent WJ.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.

\ rna 1 baseColorUseCds genbank\ baseColorUseSequence genbank\ group rna\ indelDoubleInsert on\ indelQueryInsert on\ longLabel Non-C. intestinalis mRNAs from GenBank\ shortLabel Other mRNAs\ showDiffBasesAllScales .\ spectrum on\ track xenoMrna\ type psl xeno\ visibility dense\ xenoRefGene Other RefSeq genePred xenoRefPep xenoRefMrna Non-C. intestinalis RefSeq Genes 1 100 12 12 120 133 133 187 0 0 0

Description

\

\ This track shows known protein-coding and non-protein-coding genes \ for organisms other than C. intestinalis, taken from the NCBI RNA reference\ sequences collection (RefSeq). The data underlying this track are \ updated weekly.

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ gene prediction \ tracks.\ The color shading indicates the level of review the RefSeq record has \ undergone: predicted (light), provisional (medium), reviewed (dark).

\

\ The item labels and display colors of features within this track can be\ configured through the controls at the top of the track description page. \

\ \

Methods

\

\ The RNAs were aligned against the C. intestinalis genome using blat; those\ with an alignment of less than 15% were discarded. When a single RNA aligned \ in multiple places, the alignment having the highest base identity was \ identified. Only alignments having a base identity level within 0.5% of \ the best and at least 25% base identity with the genomic sequence were kept.\

\ \

Credits

\

\ This track was produced at UCSC from RNA sequence data\ generated by scientists worldwide and curated by the \ NCBI RefSeq project.

\ \

References

\

\ Kent WJ.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.

\ genes 1 color 12,12,120\ group genes\ longLabel Non-C. intestinalis RefSeq Genes\ shortLabel Other RefSeq\ track xenoRefGene\ type genePred xenoRefPep xenoRefMrna\ visibility dense\ pubs Publications bed 4 Publications: Sequences in Scientific Articles 1 100 0 0 0 127 127 127 0 0 0

Description

\

This track is based on text-mining of full-text biomedical articles and includes two types of subtracks:

\ \ \

Both sources of information are linked to the respective articles.\ Background information on how permission to full-text data was obtained can be found on the project website. \

Display Convention and Configuration

\

The sequence subtrack indicates the location of sequences in publications\ mapped back to the genome, annotated with the first author and the year of the\ publication. All matches of one article are grouped ("chained") together.\ Article titles are shown when you move the mouse cursor over the features.\ Thicker parts of the features (exons) represent matching sequences,\ connected by thin lines to matches from the same article within 30 kbp.

\ \

The subtrack "individual sequence matches" activates automatically when\ the user clicks a sequence match and follows the link "Show sequence matches individually" \ from the details page. Mouse-overs show flanking text around the sequence, and clicking\ features links to BLAT alignments.\

\ \

All other subtracks (i.e. bands, genes, SNPs) show the number of matching articles as\ the feature description. Clicking on them shows the sentences and sections in articles \ where the identifiers were found.

\ \

The track configuration includes a keyword and year filter. Keywords are space-separated\ and are searched in the article's title, author list, and abstract.

\ \

Data

\

The track is based on text from biomedical research articles, obtained as\ part of the UCSC Genocoding Project.

\ \

The current dataset consists of about 600,000 files (main text and\ supplementary files) from PubMed Central (Open-Access set) and around 6 million text\ files (main text) from Elsevier (as part of the Sciverse Apps program).

\ \

Methods

\

\ All file types (including XML, raw ASCII, PDFs and various Microsoft\ Office formats (Excel, Word, PowerPoint)) were converted to text. The results were processed \ to find groups of words that look like DNA/RNA sequences or\ words that look like protein sequences. These were then mapped with BLAT to the\ human genome and these model organisms: mouse (mm9), rat (rn4), zebrafish\ (danRer6), Drosophila melanogaster (dm3), X. tropicalis (xenTro2), Medaka\ (oryLat2), C. intestinalis (ci2), C. elegans (ce6) and yeast (sacCer2).\ \ The pipeline roughly proceeds through these steps:\

\ \

Note that due to the 90% identity filter, some sequences do not match\ anywhere in the genome. Examples include primers with added restriction sites,\ mutation primers, or any other sequence that joins or mixes two pieces of genomic\ DNA not part of RefSeq. Also note that some gene symbols correspond to \ English words which can sometimes lead to many false positives.

\ \

Credits

\

Software and processing by Maximilian Haeussler. UCSC Track visualisation by\ Larry Meyer and Hiram Clawson. Elsevier support by Max Berenstein, Raphael\ Sidi, Judd Dunham, Scott Robbins and colleagues. Original version written at the Bergman Lab,\ University of Manchester, UK. Testing by Mary Mangan, OpenHelix Inc, and Greg Roe, UCSC.

\ \

Feedback

\ Please send ideas, comments or feedback on this track to\ \ max@soe.ucsc.edu.\ \ We are very interested in getting access to more articles from publishers for this\ dataset; see the project website.\

\ \

References

\

\ Aerts S, Haeussler M, van Vooren S, Griffith OL, Hulpiau P, Jones SJ, Montgomery SB, Bergman CM,\ Open Regulatory Annotation Consortium.\ \ Text-mining assisted regulatory annotation.\ Genome Biol. 2008;9(2):R31.\ PMID: 18271954; PMC: PMC2374703\

\ \

\ Haeussler M, Gerner M, Bergman CM.\ \ Annotating genes and genomes with DNA sequences extracted from biomedical articles.\ Bioinformatics. 2011 Apr 1;27(7):980-6.\ PMID: 21325301; PMC: PMC3065681\

\ \

\ Van Noorden R.\ \ Trouble at the text mine.\ Nature. 2012 Mar 7;483(7388):134-5.\

\ pub 1 color 0,0,0\ compositeTrack on\ group pub\ longLabel Publications: Sequences in Scientific Articles\ nextExonText Next Match\ noInherit on\ prevExonText Prev Match\ pubsArticleTable hgFixed.pubsArticle\ pubsMarkerTable hgFixed.pubsMarkerAnnot\ pubsPslTrack pubsBlatPsl\ pubsSequenceTable hgFixed.pubsSequenceAnnot\ shortLabel Publications\ track pubs\ type bed 4\ visibility dense\ simpleRepeat Simple Repeats bed 4 + Simple Tandem Repeats by TRF 0 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track displays simple tandem repeats (possibly imperfect repeats) located\ by Tandem Repeats\ Finder (TRF) which is specialized for this purpose. These repeats can\ occur within coding regions of genes and may be quite\ polymorphic. Repeat expansions are sometimes associated with specific\ diseases.

\ \

Methods

\

\ For more information about the TRF program, see Benson (1999).\

\ \

Credits

\

\ TRF was written by \ Gary Benson.

\ \

References

\ \

\ Benson G.\ \ Tandem repeats finder: a program to analyze DNA sequences.\ Nucleic Acids Res. 1999 Jan 15;27(2):573-80.\ PMID: 9862982; PMC: PMC148217\

\ varRep 1 group varRep\ longLabel Simple Tandem Repeats by TRF\ shortLabel Simple Repeats\ track simpleRepeat\ type bed 4 +\ visibility hide\ pubsBingBlat Web Sequences bed 12 + DNA Sequences in Web Pages Indexed by Bing.com / Microsoft Research 0 100 0 0 0 127 127 127 0 0 0

Description

\

This track is powered by Bing! and Microsoft Research. UCSC collaborators at\ Microsoft Research (Bob Davidson, David Heckerman) implemented a DNA sequence\ detector and processed thirty days of web crawler updates, which covers\ roughly 40 billion webpages. The results were mapped with BLAT to the genome.

\ \

Display Convention and Configuration

\

The track indicates the location of sequences on web pages\ mapped to the genome, labelled with the web page URL. If the web page includes\ invisible meta data, then the first author and a year of publication \ is shown instead of the URL. All\ matches of one web page are grouped ("chained") together.\ Web page titles are shown when you move the mouse cursor over the features.\ Thicker parts of the features (exons) represent matching sequences,\ connected by thin lines to matches from the same web page within 30 kbp.

\ \ \ \

Methods

\

\ All file types (PDFs and various Microsoft Office formats) were converted to\ text. The results were processed to find groups of words that look like DNA/RNA\ sequences. These were then mapped with BLAT to the human genome using the same\ software as used in the Publication track.

\ \

Credits

\

DNA sequence detection by Bob Davidson at Microsoft Research. \ HTML parsing and sequence mapping by Maximilian Haeussler at UCSC.

\ \

References

\ \

\ Aerts S, Haeussler M, van Vooren S, Griffith OL, Hulpiau P, Jones SJ, Montgomery SB, Bergman CM, Open Regulatory Annotation Consortium.\ \ Text-mining assisted regulatory annotation.\ Genome Biol. 2008;9(2):R31.\ PMID: 18271954; PMC: PMC2374703\

\ \

\ Haeussler M, Gerner M, Bergman CM.\ \ Annotating genes and genomes with DNA sequences extracted from biomedical articles.\ Bioinformatics. 2011 Apr 1;27(7):980-6.\ PMID: 21325301; PMC: PMC3065681\

\ \

\ Van Noorden R.\ \ Trouble at the text mine.\ Nature. 2012 Mar 7;483(7388):134-5.\

\ pub 1 configurable off\ configureByPopup off\ group pub\ longLabel DNA Sequences in Web Pages Indexed by Bing.com / Microsoft Research\ nextExonText Next Match\ prevExonText Prev Match\ pubsArticleTable hgFixed.pubsBingArticle\ pubsMarkerTable hgFixed.pubsBingMarkerAnnot\ pubsPslTrack pubsBingBlatPsl\ pubsSequenceTable hgFixed.pubsBingSequenceAnnot\ shortLabel Web Sequences\ track pubsBingBlat\ type bed 12 +\ visibility hide\ rmsk RepeatMasker rmsk Repeating Elements by RepeatMasker 1 149.1 0 0 0 127 127 127 1 0 0

Description

\

\ This track was created by using Arian Smit's RepeatMasker program, which screens DNA sequences \ for interspersed repeats and low complexity DNA sequences. The program\ outputs a detailed annotation of the repeats that are present in the\ query sequence (represented by this track), as well as a modified version\ of the query sequence in which all the annotated repeats have been masked\ (generally available on the\ Downloads page). RepeatMasker uses \ the Repbase Update library of repeats from the \ Genetic \ Information Research Institute (GIRI). \ Repbase Update is described in Jurka, J. (2000) in the References section below.

\ \

Display Conventions and Configuration

\

\ In full display mode, this track displays up to ten different classes of repeats:\

\

\ The level of color shading in the graphical display reflects the amount of \ base mismatch, base deletion, and base insertion associated with a repeat \ element. The higher the combined number of these, the lighter the shading.

\ \

Methods

\

\ UCSC has used the most current versions of the RepeatMasker software \ and repeat libraries available to generate these data. Note that these \ versions may be newer than those that are publicly available on the Internet. \

\

\ Data are generated using the RepeatMasker -s flag. Additional flags\ may be used for certain organisms. Repeats are soft-masked. Alignments may \ extend through repeats, but are not permitted to initiate in them. \ See the \ FAQ for \ more information.

\ \

Credits

\

\ Thanks to Arian Smit and GIRI\ for providing the tools and repeat libraries used to generate this track.

\ \

References

\

\ Jurka J.\ Repbase update: a database and an electronic journal of repetitive elements.\ Trends Genet. 2000 Sep;16(9):418-20.\ PMID: 10973072\

\ varRep 0 canPack off\ group varRep\ longLabel Repeating Elements by RepeatMasker\ priority 149.1\ shortLabel RepeatMasker\ spectrum on\ track rmsk\ type rmsk\ visibility dense\