cartVersion cartVersion cartVersion cartVersion 0 0 0 0 0 0 0 0 0 0 0 cartVersion cartVersion cartVersion 0 cartVersion 0 cons160way 160 Accessions bed 4 Multiz Alignment & Conservation (160 Virus Strains, Accession Names) 0 1 0 0 0 127 127 127 0 0 0
\ Downloads for data in this track are available:\
\ This track shows multiple alignments of 160 virus sequences,\ composed of 158 Ebola virus sequences and two Marburg virus sequences\ aligned to the Ebola virus reference sequence G3683/KM034562.1.\ It also includes measurements of evolutionary conservation using\ two methods (phastCons and phyloP) from the\ \ PHAST package, for all 160 virus sequences.\ The multiple alignments were generated using multiz and\ other tools in the UCSC/Penn State Bioinformatics\ comparative genomics alignment pipeline.\ Conserved elements identified by phastCons are also displayed in\ this track.
\\ PhastCons (which has been used in previous Conservation tracks) is a hidden\ Markov model-based method that estimates the probability that each\ nucleotide belongs to a conserved element, based on the multiple alignment.\ It considers not just each individual alignment column, but also its\ flanking columns. By contrast, phyloP separately measures conservation at\ individual columns, ignoring the effects of their neighbors. As a\ consequence, the phyloP plots have a less smooth appearance than the\ phastCons plots, with more "texture" at individual sites. The two methods\ have different strengths and weaknesses. PhastCons is sensitive to "runs"\ of conserved sites, and is therefore effective for picking out conserved\ elements. PhyloP, on the other hand, is more appropriate for evaluating\ signatures of selection at particular nucleotides or classes of nucleotides\ (e.g., third codon positions, or first positions of miRNA target sites).
\\ Another important difference is that phyloP can measure acceleration\ (faster evolution than expected under neutral drift) as well as\ conservation (slower than expected evolution). In the phyloP plots, sites\ predicted to be conserved are assigned positive scores (and shown in blue),\ while sites predicted to be fast-evolving are assigned negative scores (and\ shown in red). The absolute values of the scores represent -log p-values\ under a null hypothesis of neutral evolution. The phastCons scores, by\ contrast, represent probabilities of negative selection and range between 0\ and 1.
\\ Both phastCons and phyloP treat alignment gaps and unaligned nucleotides as\ missing data.
\ \\ The data contained in the 160 Accessions and the\ 160 Strains tracks are the same. The only\ difference between these two tracks are the identifiers used to label the sequences. In the 160\ Accessions track, the sequence is labeled using its NCBI Nucleotide accession number. In the 160 Strains track, we used a shortened\ version of the strain name from the NCBI Nucleotide entry to label each sequence, and when this\ was unavailable, we constructed our own using the DEFINITION, /country, and\ /collection_date lines from the NCBI record.\
\\ The mapping between sequence identifiers and strain names is provided via a text file on our download server. \ Additional meta information from Genbank is provided in a tab-separated file.
\ \\ Pairwise alignments of each species to the Ebola virus genome are\ displayed as a series of colored blocks indicating the functional effect of polymorphisms (in pack\ mode), or as a wiggle (in full mode) that indicates alignment quality.\ In dense display mode, percent identity of the whole alignments is shown in grayscale using\ darker values to indicate higher levels of identity.\
\ In pack mode, regions that align with 100% identity are not shown. When there is not 100% percent\ identity, blocks of four colors are drawn.\
\ Checkboxes on the track configuration page allow selection of the\ species to include in the pairwise display.\ Configuration buttons are available to select all of the species\ (Set all), deselect all of the species (Clear all), or\ use the default settings (Set defaults).\
\ To view detailed information about the alignments at a specific\ position, zoom the display in to 30,000 or fewer bases, then click on\ the alignment.
\ \\ When zoomed-in to the base-level display, the track shows the base\ composition of each alignment.\ The numbers and symbols on the Gaps\ line indicate the lengths of gaps in the Ebola virus sequence at those\ alignment positions relative to the longest non-Ebola virus sequence.\ If there is sufficient space in the display, the size of the gap is shown.\ If the space is insufficient and the gap size is a multiple of 3, a\ "*" is displayed; other gap sizes are indicated by "+".
\\ Codon translation is available in base-level display mode if the\ displayed region is identified as a coding segment. To display this annotation, select the species\ for translation from the pull-down menu in the Codon\ Translation configuration section at the top of the page. Then, select one of\ the following modes:\
\ Pairwise alignments with the reference sequence were generated for\ each sequence using lastz version 1.03.52.\ Parameters used for each lastz alignment:\
\ # hsp_threshold = 2200\ # gapped_threshold = 4000 = L\ # x_drop = 910\ # y_drop = 3400 = Y\ # gap_open_penalty = 400\ # gap_extend_penalty = 30\ # A C G T\ # A 91 -90 -25 -100\ # C -90 100 -100 -25\ # G -25 -100 100 -90\ # T -100 -25 -90 91\ # seed=1110100110010101111 w/transition\ # step=1\\ Pairwise alignments were then linked into chains using a dynamic programming\ algorithm that finds maximally scoring chains of gapless subsections\ of the alignments organized in a kd-tree. Parameters used in\ the chaining (axtChain) step: -minScore=10 -linearGap=loose\ \
\ High-scoring chains were then placed along the genome, with\ gaps filled by lower-scoring chains, to produce an alignment net.\
\\ The multiple alignment was constructed from the resulting best-in-genome\ pairwise alignments progressively aligned using multiz/autoMZ,\ following a simple binary tree phylogeny:\
\ (((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((\ (((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((\ (KM034562v1 KJ660346v2) KJ660347v2) KJ660348v2) KM034554v1) KM034555v1) \ KM034557v1) KM034560v1) KM233039v1) KM233043v1) KM233045v1) KM233050v1) \ KM233051v1) KM233053v1) KM233056v1) KM233057v1) KM233063v1) KM233069v1) \ KM233070v1) KM233072v1) KM233089v1) KM233092v1) KM233096v1) KM233097v1) \ KM233098v1) KM233099v1) KM233103v1) KM233104v1) KM233109v1) KM233110v1) \ KM233113v1) AF086833v2) AF272001v1) AY142960v1) EU224440v2) KC242791v1) \ KC242792v1) KC242794v1) KC242796v1) KC242798v1) KC242799v1) KC242801v1) \ KM034551v1) KM034553v1) KM034556v1) KM034558v1) KM034559v1) KM034561v1) \ KM233035v1) KM233036v1) KM233037v1) KM233038v1) KM233040v1) KM233041v1) \ KM233042v1) KM233044v1) KM233046v1) KM233047v1) KM233048v1) KM233049v1) \ KM233052v1) KM233054v1) KM233055v1) KM233058v1) KM233059v1) KM233061v1) \ KM233062v1) KM233064v1) KM233065v1) KM233066v1) KM233067v1) KM233068v1) \ KM233071v1) KM233073v1) KM233074v1) KM233075v1) KM233076v1) KM233077v1) \ KM233078v1) KM233079v1) KM233080v1) KM233081v1) KM233082v1) KM233084v1) \ KM233085v1) KM233086v1) KM233087v1) KM233088v1) KM233093v1) KM233094v1) \ KM233095v1) KM233100v1) KM233101v1) KM233102v1) KM233105v1) KM233106v1) \ KM233107v1) KM233108v1) KM233111v1) KM233112v1) KM233114v1) KM233115v1) \ KM233116v1) KM233091v1) NC_002549v1) KM034552v1) KM233060v1) KM233083v1) \ KM233090v1) KM233117v1) KM233118v1) AY354458v1) KC242784v1) KC242785v1) \ KC242786v1) KC242787v1) KC242788v1) KC242789v1) KC242790v1) KC242793v1) \ KC242795v1) KC242797v1) KC242800v1) AF499101v1) JQ352763v1) HQ613402v1) \ HQ613403v1) KM034549v1) KM034550v1) KM034563v1) FJ217162v1) NC_014372v1) \ FJ217161v1) NC_014373v1) KC545395v1) KC545394v1) KC545393v1) KC545396v1) \ FJ621585v1) FJ621584v1) JX477166v1) AY769362v1) AB050936v1) EU338380v1) \ KC242783v2) JX477165v1) AF522874v1) NC_004161v1) FJ621583v1) KC589025v1) \ FJ968794v1) AY729654v1) NC_006432v1) KC545389v1) KC545390v1) KC545391v1) \ KC545392v1) JN638998v1) NC_024781v1) NC_001608v3)\\
\ (((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((\ (((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((\ (G3686v1_2014 Guinea_Kissidougou-C15_2014) Guinea_Gueckedou-C07_2014) \ Guinea_Gueckedou-C05_2014) G3676v1_2014) G3676v2_2014) G3677v2_2014) \ G3682v1_2014) EM112_2014) EM120_2014) EM124v1_2014) G3713v2_2014) G3713v3_2014) \ G3724_2014) G3735v1_2014) G3735v2_2014) G3764_2014) G3770v1_2014) G3770v2_2014) \ G3782_2014) G3814_2014) G3818_2014) G3822_2014) G3823_2014) G3825v1_2014) \ G3825v2_2014) G3831_2014) G3834_2014) G3846_2014) G3848_2014) G3856v1_2014) \ AF086833v2_1976) Mayinga_1976) Mayinga_2002) GuineaPig_Mayinga_2007) \ Bonduni_1977) Gabon_1994) 2Nza_1996) 13625Kikwit_1995) 1Ikot_Gabon_1996) \ 13709Kikwit_1995) deRoover_1976) EM096_2014) G3670v1_2014) G3677v1_2014) \ G3679v1_2014) G3680v1_2014) G3683v1_2014) EM104_2014) EM106_2014) EM110_2014) \ EM111_2014) EM113_2014) EM115_2014) EM119_2014) EM121_2014) EM124v2_2014) \ EM124v3_2014) EM124v4_2014) G3707_2014) G3713v4_2014) G3729_2014) G3734v1_2014) \ G3750v1_2014) G3750v2_2014) G3752_2014) G3758_2014) G3765v2_2014) G3769v1_2014) \ G3769v2_2014) G3769v3_2014) G3769v4_2014) G3771_2014) G3786_2014) G3787_2014) \ G3788_2014) G3789v1_2014) G3795_2014) G3796_2014) G3798_2014) G3799_2014) \ G3800_2014) G3805v1_2014) G3807_2014) G3808_2014) G3809_2014) G3810v1_2014) \ G3810v2_2014) G3819_2014) G3820_2014) G3821_2014) G3826_2014) G3827_2014) \ G3829_2014) G3838_2014) G3840_2014) G3841_2014) G3845_2014) G3850_2014) \ G3851_2014) G3856v3_2014) G3857_2014) NM042v1_2014) G3817_2014) \ NC_002549v1_1976) EM098_2014) G3750v3_2014) G3805v2_2014) G3816_2014) \ NM042v2_2014) NM042v3_2014) Zaire_1995) Luebo9_2007) Luebo0_2007) Luebo1_2007) \ Luebo23_2007) Luebo43_2007) Luebo4_2007) Luebo5_2007) 1Eko_1996) \ 1Mbie_Gabon_1996) 1Oba_Gabon_1996) Ilembe_2002) Mouse_Mayinga_2002) \ Kikwit_1995) 034-KS_2008) M-M_2007) EM095B_2014) EM095_2014) G3687v1_2014) \ Cote_dIvoire_CIEBOV_1994) Cote_dIvoire_1994) Bundibugyo_Uganda_2007) \ Bundibugyo_2007) EboBund-122_2012) EboBund-120_2012) EboBund-112_2012) \ EboBund-14_2012) Reston08-E_2008) Reston08-C_2008) Alice_TX_USA_MkCQ8167_1996) \ reconstructReston_2008) Reston_1996) Yambio_2004) Maleo_1979) Reston09-A_2009) \ Reston_PA_1990) Pennsylvania_1990) Reston08-A_2008) EboSud-639_2012) \ Boniface_1976) Gulu_Uganda_2000) Gulu_2000) EboSud-602_2012) EboSud-603_2012) \ EboSud-609_2012) EboSud-682_2012) Nakisamata_2011) \ Marburg_KitumCave_Kenya_1987) Marburg_MtElgon_Musoke_Kenya_1980)\\ Framing tables from the genes were constructed to enable\ visualization of codons in the multiple alignment display.\ \
\ Both phastCons and phyloP are phylogenetic methods that rely\ on a tree model containing the tree topology, branch lengths representing\ evolutionary distance at neutrally evolving sites, the background distribution\ of nucleotides, and a substitution rate matrix.\ The\ all-species tree model for this track was\ generated using the phyloFit program from the PHAST package\ (REV model, EM algorithm, medium precision) using multiple alignments of\ 4-fold degenerate sites extracted from the 160-way alignment\ (msa_view). The 4d sites were derived from the NCBI gene set,\ filtered to select single-coverage long transcripts.
\\ This same tree model was used in the phyloP calculations; however, the\ background frequencies were modified to maintain reversibility.\ The resulting tree model:\ all species.\
\ \\ The phastCons program computes conservation scores based on a phylo-HMM, a\ type of probabilistic model that describes both the process of DNA\ substitution at each site in a genome and the way this process changes from\ one site to the next (Felsenstein and Churchill 1996, Yang 1995, Siepel and\ Haussler 2005). PhastCons uses a two-state phylo-HMM, with a state for\ conserved regions and a state for non-conserved regions. The value plotted\ at each site is the posterior probability that the corresponding alignment\ column was "generated" by the conserved state of the phylo-HMM. These\ scores reflect the phylogeny (including branch lengths) of the species in\ question, a continuous-time Markov model of the nucleotide substitution\ process, and a tendency for conservation levels to be autocorrelated along\ the genome (i.e., to be similar at adjacent sites). The general reversible\ (REV) substitution model was used. Unlike many conservation-scoring programs,\ phastCons does not rely on a sliding window\ of fixed size; therefore, short highly-conserved regions and long moderately\ conserved regions can both obtain high scores.\ More information about\ phastCons can be found in Siepel et al, 2005.
\\ The phastCons parameters used were: expected-length=45,\ target-coverage=0.3, rho=0.3.
\ \\ The phyloP program supports several different methods for computing\ p-values of conservation or acceleration, for individual nucleotides or\ larger elements (http://compgen.cshl.edu/phast/). Here it was used\ to produce separate scores at each base (--wig-scores option), considering\ all branches of the phylogeny rather than a particular subtree or lineage\ (i.e., the --subtree option was not used). The scores were computed by\ performing a likelihood ratio test at each alignment column (--method LRT),\ and scores for both conservation and acceleration were produced (--mode\ CONACC).
\ \\ The conserved elements were predicted by running phastCons with the\ --most-conserved option. The predicted elements are segments of the alignment\ that are likely to have been "generated" by the conserved state of the\ phylo-HMM. Each element is assigned a log-odds score equal to its log\ probability under the conserved model minus its log probability under the\ non-conserved model. The "score" field associated with this track contains\ transformed log-odds scores, taking values between 0 and 1000. (The scores\ are transformed using a monotonic function of the form a * log(x) + b.) The\ raw log odds scores are retained in the "name" field and can be seen on the\ details page or in the browser when the track's display mode is set to\ "pack" or "full".
\ \This track was created using the following programs:\
\ Gire SK, Goba A, Andersen KG, Sealfon RS, Park DJ, Kanneh L, Jalloh S, Momoh M,\ Fullah M, Dudas G et al.\ Genomic surveillance elucidates Ebola virus origin and transmission \ during the 2014 outbreak.\ Science 2014 Sep 12;345(6202):1369-72.\ PMID: 25214632;\ Supplemental Materials and Methods\
\ \\ Felsenstein J, Churchill GA.\ A Hidden Markov Model approach to\ variation among sites in rate of evolution.\ Mol Biol Evol. 1996 Jan;13(1):93-104.\ PMID: 8583911\
\ \\ Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A.\ \ Detection of nonneutral substitution rates on mammalian phylogenies.\ Genome Res. 2010 Jan;20(1):110-21.\ PMID: 19858363; PMC: PMC2798823\
\ \\ Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K,\ Clawson H, Spieth J, Hillier LW, Richards S, et al.\ Evolutionarily conserved elements in vertebrate, insect, worm,\ and yeast genomes.\ Genome Res. 2005 Aug;15(8):1034-50.\ PMID: 16024819; PMC: PMC1182216\
\ \\ Siepel A, Haussler D.\ Phylogenetic Hidden Markov Models.\ In: Nielsen R, editor. Statistical Methods in Molecular Evolution.\ New York: Springer; 2005. pp. 325-351.\
\ \\ Yang Z.\ A space-time process model for the evolution of DNA\ sequences.\ Genetics. 1995 Feb;139(2):993-1005.\ PMID: 7713447; PMC: PMC1206396\
\ \\ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron:\ duplication, deletion, and rearrangement in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.\ PMID: 14500911; PMC: PMC208784\
\ \\ Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM,\ Baertsch R, Rosenbloom K, Clawson H, Green ED, et al.\ Aligning multiple genomic sequences with the threaded blockset aligner.\ Genome Res. 2004 Apr;14(4):708-15.\ PMID: 15060014; PMC: PMC383317\
\ \\ Chiaromonte F, Yap VB, Miller W.\ Scoring pairwise genomic sequence alignments.\ Pac Symp Biocomput. 2002:115-26.\ PMID: 11928468\
\ \\ Harris RS.\ Improved pairwise alignment of genomic DNA.\ Ph.D. Thesis. Pennsylvania State University, USA. 2007.\
\ \\ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-mouse alignments with BLASTZ.\ Genome Res. 2003 Jan;13(1):103-7.\ PMID: 12529312; PMC: PMC430961\
\ compGeno 1 compositeTrack on\ dragAndDrop subTracks\ group compGeno\ longLabel Multiz Alignment & Conservation (160 Virus Strains, Accession Names)\ priority 1\ shortLabel 160 Accessions\ subGroup1 view Views align=Multiz_Alignments phyloP=Basewise_Conservation_(phyloP) phastcons=Element_Conservation_(phastCons) elements=Conserved_Elements\ track cons160way\ type bed 4\ visibility hide\ strainCons160way 160 Strains bed 4 Multiz Alignment & Conservation (160 Virus Strains, Strain Names) 3 1 0 0 0 127 127 127 0 0 0\ Downloads for data in this track are available:\
\ This track shows multiple alignments of 160 virus sequences,\ composed of 158 Ebola virus sequences and two Marburg virus sequences\ aligned to the Ebola virus reference sequence G3683/KM034562.1.\ It also includes measurements of evolutionary conservation using\ two methods (phastCons and phyloP) from the\ \ PHAST package, for all 160 virus sequences.\ The multiple alignments were generated using multiz and\ other tools in the UCSC/Penn State Bioinformatics\ comparative genomics alignment pipeline.\ Conserved elements identified by phastCons are also displayed in\ this track.
\\ PhastCons (which has been used in previous Conservation tracks) is a hidden\ Markov model-based method that estimates the probability that each\ nucleotide belongs to a conserved element, based on the multiple alignment.\ It considers not just each individual alignment column, but also its\ flanking columns. By contrast, phyloP separately measures conservation at\ individual columns, ignoring the effects of their neighbors. As a\ consequence, the phyloP plots have a less smooth appearance than the\ phastCons plots, with more "texture" at individual sites. The two methods\ have different strengths and weaknesses. PhastCons is sensitive to "runs"\ of conserved sites, and is therefore effective for picking out conserved\ elements. PhyloP, on the other hand, is more appropriate for evaluating\ signatures of selection at particular nucleotides or classes of nucleotides\ (e.g., third codon positions, or first positions of miRNA target sites).
\\ Another important difference is that phyloP can measure acceleration\ (faster evolution than expected under neutral drift) as well as\ conservation (slower than expected evolution). In the phyloP plots, sites\ predicted to be conserved are assigned positive scores (and shown in blue),\ while sites predicted to be fast-evolving are assigned negative scores (and\ shown in red). The absolute values of the scores represent -log p-values\ under a null hypothesis of neutral evolution. The phastCons scores, by\ contrast, represent probabilities of negative selection and range between 0\ and 1.
\\ Both phastCons and phyloP treat alignment gaps and unaligned nucleotides as\ missing data.
\ \\ The data contained in the 160 Accessions and the\ 160 Strains tracks are the same. The only\ difference between these two tracks are the identifiers used to label the sequences. In the 160\ Accessions track, the sequence is labeled using its NCBI Nucleotide accession number. In the 160 Strains track, we used a shortened\ version of the strain name from the NCBI Nucleotide entry to label each sequence, and when this\ was unavailable, we constructed our own using the DEFINITION, /country, and\ /collection_date lines from the NCBI record.\
\\ The mapping between sequence identifiers and strain names is provided via a text file on our download server. \ Additional meta information from Genbank is provided in a tab-separated file.
\ \\ Pairwise alignments of each species to the Ebola virus genome are\ displayed as a series of colored blocks indicating the functional effect of polymorphisms (in pack\ mode), or as a wiggle (in full mode) that indicates alignment quality.\ In dense display mode, percent identity of the whole alignments is shown in grayscale using\ darker values to indicate higher levels of identity.\
\ In pack mode, regions that align with 100% identity are not shown. When there is not 100% percent\ identity, blocks of four colors are drawn.\
\ Checkboxes on the track configuration page allow selection of the\ species to include in the pairwise display.\ Configuration buttons are available to select all of the species\ (Set all), deselect all of the species (Clear all), or\ use the default settings (Set defaults).\
\ To view detailed information about the alignments at a specific\ position, zoom the display in to 30,000 or fewer bases, then click on\ the alignment.
\ \\ When zoomed-in to the base-level display, the track shows the base\ composition of each alignment.\ The numbers and symbols on the Gaps\ line indicate the lengths of gaps in the Ebola virus sequence at those\ alignment positions relative to the longest non-Ebola virus sequence.\ If there is sufficient space in the display, the size of the gap is shown.\ If the space is insufficient and the gap size is a multiple of 3, a\ "*" is displayed; other gap sizes are indicated by "+".
\\ Codon translation is available in base-level display mode if the\ displayed region is identified as a coding segment. To display this annotation, select the species\ for translation from the pull-down menu in the Codon\ Translation configuration section at the top of the page. Then, select one of\ the following modes:\
\ Pairwise alignments with the reference sequence were generated for\ each sequence using lastz version 1.03.52.\ Parameters used for each lastz alignment:\
\ # hsp_threshold = 2200\ # gapped_threshold = 4000 = L\ # x_drop = 910\ # y_drop = 3400 = Y\ # gap_open_penalty = 400\ # gap_extend_penalty = 30\ # A C G T\ # A 91 -90 -25 -100\ # C -90 100 -100 -25\ # G -25 -100 100 -90\ # T -100 -25 -90 91\ # seed=1110100110010101111 w/transition\ # step=1\\ Pairwise alignments were then linked into chains using a dynamic programming\ algorithm that finds maximally scoring chains of gapless subsections\ of the alignments organized in a kd-tree. Parameters used in\ the chaining (axtChain) step: -minScore=10 -linearGap=loose\ \
\ High-scoring chains were then placed along the genome, with\ gaps filled by lower-scoring chains, to produce an alignment net.\
\\ The multiple alignment was constructed from the resulting best-in-genome\ pairwise alignments progressively aligned using multiz/autoMZ,\ following a simple binary tree phylogeny:\
\ (((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((\ (((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((\ (KM034562v1 KJ660346v2) KJ660347v2) KJ660348v2) KM034554v1) KM034555v1) \ KM034557v1) KM034560v1) KM233039v1) KM233043v1) KM233045v1) KM233050v1) \ KM233051v1) KM233053v1) KM233056v1) KM233057v1) KM233063v1) KM233069v1) \ KM233070v1) KM233072v1) KM233089v1) KM233092v1) KM233096v1) KM233097v1) \ KM233098v1) KM233099v1) KM233103v1) KM233104v1) KM233109v1) KM233110v1) \ KM233113v1) AF086833v2) AF272001v1) AY142960v1) EU224440v2) KC242791v1) \ KC242792v1) KC242794v1) KC242796v1) KC242798v1) KC242799v1) KC242801v1) \ KM034551v1) KM034553v1) KM034556v1) KM034558v1) KM034559v1) KM034561v1) \ KM233035v1) KM233036v1) KM233037v1) KM233038v1) KM233040v1) KM233041v1) \ KM233042v1) KM233044v1) KM233046v1) KM233047v1) KM233048v1) KM233049v1) \ KM233052v1) KM233054v1) KM233055v1) KM233058v1) KM233059v1) KM233061v1) \ KM233062v1) KM233064v1) KM233065v1) KM233066v1) KM233067v1) KM233068v1) \ KM233071v1) KM233073v1) KM233074v1) KM233075v1) KM233076v1) KM233077v1) \ KM233078v1) KM233079v1) KM233080v1) KM233081v1) KM233082v1) KM233084v1) \ KM233085v1) KM233086v1) KM233087v1) KM233088v1) KM233093v1) KM233094v1) \ KM233095v1) KM233100v1) KM233101v1) KM233102v1) KM233105v1) KM233106v1) \ KM233107v1) KM233108v1) KM233111v1) KM233112v1) KM233114v1) KM233115v1) \ KM233116v1) KM233091v1) NC_002549v1) KM034552v1) KM233060v1) KM233083v1) \ KM233090v1) KM233117v1) KM233118v1) AY354458v1) KC242784v1) KC242785v1) \ KC242786v1) KC242787v1) KC242788v1) KC242789v1) KC242790v1) KC242793v1) \ KC242795v1) KC242797v1) KC242800v1) AF499101v1) JQ352763v1) HQ613402v1) \ HQ613403v1) KM034549v1) KM034550v1) KM034563v1) FJ217162v1) NC_014372v1) \ FJ217161v1) NC_014373v1) KC545395v1) KC545394v1) KC545393v1) KC545396v1) \ FJ621585v1) FJ621584v1) JX477166v1) AY769362v1) AB050936v1) EU338380v1) \ KC242783v2) JX477165v1) AF522874v1) NC_004161v1) FJ621583v1) KC589025v1) \ FJ968794v1) AY729654v1) NC_006432v1) KC545389v1) KC545390v1) KC545391v1) \ KC545392v1) JN638998v1) NC_024781v1) NC_001608v3)\\
\ (((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((\ (((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((\ (G3686v1_2014 Guinea_Kissidougou-C15_2014) Guinea_Gueckedou-C07_2014) \ Guinea_Gueckedou-C05_2014) G3676v1_2014) G3676v2_2014) G3677v2_2014) \ G3682v1_2014) EM112_2014) EM120_2014) EM124v1_2014) G3713v2_2014) G3713v3_2014) \ G3724_2014) G3735v1_2014) G3735v2_2014) G3764_2014) G3770v1_2014) G3770v2_2014) \ G3782_2014) G3814_2014) G3818_2014) G3822_2014) G3823_2014) G3825v1_2014) \ G3825v2_2014) G3831_2014) G3834_2014) G3846_2014) G3848_2014) G3856v1_2014) \ AF086833v2_1976) Mayinga_1976) Mayinga_2002) GuineaPig_Mayinga_2007) \ Bonduni_1977) Gabon_1994) 2Nza_1996) 13625Kikwit_1995) 1Ikot_Gabon_1996) \ 13709Kikwit_1995) deRoover_1976) EM096_2014) G3670v1_2014) G3677v1_2014) \ G3679v1_2014) G3680v1_2014) G3683v1_2014) EM104_2014) EM106_2014) EM110_2014) \ EM111_2014) EM113_2014) EM115_2014) EM119_2014) EM121_2014) EM124v2_2014) \ EM124v3_2014) EM124v4_2014) G3707_2014) G3713v4_2014) G3729_2014) G3734v1_2014) \ G3750v1_2014) G3750v2_2014) G3752_2014) G3758_2014) G3765v2_2014) G3769v1_2014) \ G3769v2_2014) G3769v3_2014) G3769v4_2014) G3771_2014) G3786_2014) G3787_2014) \ G3788_2014) G3789v1_2014) G3795_2014) G3796_2014) G3798_2014) G3799_2014) \ G3800_2014) G3805v1_2014) G3807_2014) G3808_2014) G3809_2014) G3810v1_2014) \ G3810v2_2014) G3819_2014) G3820_2014) G3821_2014) G3826_2014) G3827_2014) \ G3829_2014) G3838_2014) G3840_2014) G3841_2014) G3845_2014) G3850_2014) \ G3851_2014) G3856v3_2014) G3857_2014) NM042v1_2014) G3817_2014) \ NC_002549v1_1976) EM098_2014) G3750v3_2014) G3805v2_2014) G3816_2014) \ NM042v2_2014) NM042v3_2014) Zaire_1995) Luebo9_2007) Luebo0_2007) Luebo1_2007) \ Luebo23_2007) Luebo43_2007) Luebo4_2007) Luebo5_2007) 1Eko_1996) \ 1Mbie_Gabon_1996) 1Oba_Gabon_1996) Ilembe_2002) Mouse_Mayinga_2002) \ Kikwit_1995) 034-KS_2008) M-M_2007) EM095B_2014) EM095_2014) G3687v1_2014) \ Cote_dIvoire_CIEBOV_1994) Cote_dIvoire_1994) Bundibugyo_Uganda_2007) \ Bundibugyo_2007) EboBund-122_2012) EboBund-120_2012) EboBund-112_2012) \ EboBund-14_2012) Reston08-E_2008) Reston08-C_2008) Alice_TX_USA_MkCQ8167_1996) \ reconstructReston_2008) Reston_1996) Yambio_2004) Maleo_1979) Reston09-A_2009) \ Reston_PA_1990) Pennsylvania_1990) Reston08-A_2008) EboSud-639_2012) \ Boniface_1976) Gulu_Uganda_2000) Gulu_2000) EboSud-602_2012) EboSud-603_2012) \ EboSud-609_2012) EboSud-682_2012) Nakisamata_2011) \ Marburg_KitumCave_Kenya_1987) Marburg_MtElgon_Musoke_Kenya_1980)\\ Framing tables from the genes were constructed to enable\ visualization of codons in the multiple alignment display.\ \
\ Both phastCons and phyloP are phylogenetic methods that rely\ on a tree model containing the tree topology, branch lengths representing\ evolutionary distance at neutrally evolving sites, the background distribution\ of nucleotides, and a substitution rate matrix.\ The\ all-species tree model for this track was\ generated using the phyloFit program from the PHAST package\ (REV model, EM algorithm, medium precision) using multiple alignments of\ 4-fold degenerate sites extracted from the 160-way alignment\ (msa_view). The 4d sites were derived from the NCBI gene set,\ filtered to select single-coverage long transcripts.
\\ This same tree model was used in the phyloP calculations; however, the\ background frequencies were modified to maintain reversibility.\ The resulting tree model:\ all species.\
\ \\ The phastCons program computes conservation scores based on a phylo-HMM, a\ type of probabilistic model that describes both the process of DNA\ substitution at each site in a genome and the way this process changes from\ one site to the next (Felsenstein and Churchill 1996, Yang 1995, Siepel and\ Haussler 2005). PhastCons uses a two-state phylo-HMM, with a state for\ conserved regions and a state for non-conserved regions. The value plotted\ at each site is the posterior probability that the corresponding alignment\ column was "generated" by the conserved state of the phylo-HMM. These\ scores reflect the phylogeny (including branch lengths) of the species in\ question, a continuous-time Markov model of the nucleotide substitution\ process, and a tendency for conservation levels to be autocorrelated along\ the genome (i.e., to be similar at adjacent sites). The general reversible\ (REV) substitution model was used. Unlike many conservation-scoring programs,\ phastCons does not rely on a sliding window\ of fixed size; therefore, short highly-conserved regions and long moderately\ conserved regions can both obtain high scores.\ More information about\ phastCons can be found in Siepel et al, 2005.
\\ The phastCons parameters used were: expected-length=45,\ target-coverage=0.3, rho=0.3.
\ \\ The phyloP program supports several different methods for computing\ p-values of conservation or acceleration, for individual nucleotides or\ larger elements (http://compgen.cshl.edu/phast/). Here it was used\ to produce separate scores at each base (--wig-scores option), considering\ all branches of the phylogeny rather than a particular subtree or lineage\ (i.e., the --subtree option was not used). The scores were computed by\ performing a likelihood ratio test at each alignment column (--method LRT),\ and scores for both conservation and acceleration were produced (--mode\ CONACC).
\ \\ The conserved elements were predicted by running phastCons with the\ --most-conserved option. The predicted elements are segments of the alignment\ that are likely to have been "generated" by the conserved state of the\ phylo-HMM. Each element is assigned a log-odds score equal to its log\ probability under the conserved model minus its log probability under the\ non-conserved model. The "score" field associated with this track contains\ transformed log-odds scores, taking values between 0 and 1000. (The scores\ are transformed using a monotonic function of the form a * log(x) + b.) The\ raw log odds scores are retained in the "name" field and can be seen on the\ details page or in the browser when the track's display mode is set to\ "pack" or "full".
\ \This track was created using the following programs:\
\ Gire SK, Goba A, Andersen KG, Sealfon RS, Park DJ, Kanneh L, Jalloh S, Momoh M,\ Fullah M, Dudas G et al.\ Genomic surveillance elucidates Ebola virus origin and transmission \ during the 2014 outbreak.\ Science 2014 Sep 12;345(6202):1369-72.\ PMID: 25214632;\ Supplemental Materials and Methods\
\ \\ Felsenstein J, Churchill GA.\ A Hidden Markov Model approach to\ variation among sites in rate of evolution.\ Mol Biol Evol. 1996 Jan;13(1):93-104.\ PMID: 8583911\
\ \\ Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A.\ \ Detection of nonneutral substitution rates on mammalian phylogenies.\ Genome Res. 2010 Jan;20(1):110-21.\ PMID: 19858363; PMC: PMC2798823\
\ \\ Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K,\ Clawson H, Spieth J, Hillier LW, Richards S, et al.\ Evolutionarily conserved elements in vertebrate, insect, worm,\ and yeast genomes.\ Genome Res. 2005 Aug;15(8):1034-50.\ PMID: 16024819; PMC: PMC1182216\
\ \\ Siepel A, Haussler D.\ Phylogenetic Hidden Markov Models.\ In: Nielsen R, editor. Statistical Methods in Molecular Evolution.\ New York: Springer; 2005. pp. 325-351.\
\ \\ Yang Z.\ A space-time process model for the evolution of DNA\ sequences.\ Genetics. 1995 Feb;139(2):993-1005.\ PMID: 7713447; PMC: PMC1206396\
\ \\ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron:\ duplication, deletion, and rearrangement in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.\ PMID: 14500911; PMC: PMC208784\
\ \\ Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM,\ Baertsch R, Rosenbloom K, Clawson H, Green ED, et al.\ Aligning multiple genomic sequences with the threaded blockset aligner.\ Genome Res. 2004 Apr;14(4):708-15.\ PMID: 15060014; PMC: PMC383317\
\ \\ Chiaromonte F, Yap VB, Miller W.\ Scoring pairwise genomic sequence alignments.\ Pac Symp Biocomput. 2002:115-26.\ PMID: 11928468\
\ \\ Harris RS.\ Improved pairwise alignment of genomic DNA.\ Ph.D. Thesis. Pennsylvania State University, USA. 2007.\
\ \\ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-mouse alignments with BLASTZ.\ Genome Res. 2003 Jan;13(1):103-7.\ PMID: 12529312; PMC: PMC430961\
\ compGeno 1 compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html cons160way\ longLabel Multiz Alignment & Conservation (160 Virus Strains, Strain Names)\ priority 1\ shortLabel 160 Strains\ subGroup1 view Views align=Multiz_Alignments phyloP=Basewise_Conservation_(phyloP) phastcons=Element_Conservation_(phastCons) elements=Conserved_Elements\ track strainCons160way\ type bed 4\ visibility pack\ iedbsupp2A01 A*01 bigBed 12 . IEDB Predicted binding Macaque HLA T-Cell Class I A*01 1 1 20 20 0 137 137 127 1 0 0 immu 1 longLabel IEDB Predicted binding Macaque HLA T-Cell Class I A*01\ parent iedbPredClassIMac\ shortLabel A*01\ track iedbsupp2A01\ type bigBed 12 .\ visibility dense\ iedbsupp1A0101 A*01:01 bigBed 12 . IEDB predicted binding Human HLA T-Cell Class I A*01:01 1 1 0 0 0 127 127 127 1 0 0 immu 1 longLabel IEDB predicted binding Human HLA T-Cell Class I A*01:01\ parent iedbPred1\ shortLabel A*01:01\ track iedbsupp1A0101\ type bigBed 12 .\ visibility dense\ cons160wayViewphyloP Basewise Conservation (phyloP) bed 4 Multiz Alignment & Conservation (160 Virus Strains, Accession Names) 2 1 0 0 0 127 127 127 0 0 0 compGeno 1 longLabel Multiz Alignment & Conservation (160 Virus Strains, Accession Names)\ parent cons160way\ shortLabel Basewise Conservation (phyloP)\ track cons160wayViewphyloP\ view phyloP\ viewLimits -3:0.5\ viewLimitsMax -4.611:0.934\ visibility full\ strainCons160wayViewphyloP Basewise Conservation (phyloP) bed 4 Multiz Alignment & Conservation (160 Virus Strains, Strain Names) 2 1 0 0 0 127 127 127 0 0 0 compGeno 1 longLabel Multiz Alignment & Conservation (160 Virus Strains, Strain Names)\ parent strainCons160way\ shortLabel Basewise Conservation (phyloP)\ track strainCons160wayViewphyloP\ view phyloP\ viewLimits -3:0.5\ viewLimitsMax -4.611:0.934\ visibility full\ cons160wayViewelements Conserved Elements bed 4 Multiz Alignment & Conservation (160 Virus Strains, Accession Names) 1 1 0 0 0 127 127 127 0 0 0 compGeno 1 longLabel Multiz Alignment & Conservation (160 Virus Strains, Accession Names)\ parent cons160way\ shortLabel Conserved Elements\ track cons160wayViewelements\ view elements\ visibility dense\ strainCons160wayViewelements Conserved Elements bed 4 Multiz Alignment & Conservation (160 Virus Strains, Strain Names) 1 1 0 0 0 127 127 127 0 0 0 compGeno 1 longLabel Multiz Alignment & Conservation (160 Virus Strains, Strain Names)\ parent strainCons160way\ shortLabel Conserved Elements\ track strainCons160wayViewelements\ view elements\ visibility dense\ cons160wayViewphastcons Element Conservation (phastCons) bed 4 Multiz Alignment & Conservation (160 Virus Strains, Accession Names) 2 1 0 0 0 127 127 127 0 0 0 compGeno 1 longLabel Multiz Alignment & Conservation (160 Virus Strains, Accession Names)\ parent cons160way\ shortLabel Element Conservation (phastCons)\ track cons160wayViewphastcons\ view phastcons\ visibility full\ strainCons160wayViewphastcons Element Conservation (phastCons) bed 4 Multiz Alignment & Conservation (160 Virus Strains, Strain Names) 2 1 0 0 0 127 127 127 0 0 0 compGeno 1 longLabel Multiz Alignment & Conservation (160 Virus Strains, Strain Names)\ parent strainCons160way\ shortLabel Element Conservation (phastCons)\ track strainCons160wayViewphastcons\ view phastcons\ visibility full\ cons160wayViewalign Multiz Alignments bed 4 Multiz Alignment & Conservation (160 Virus Strains, Accession Names) 3 1 0 0 0 127 127 127 0 0 0 compGeno 1 longLabel Multiz Alignment & Conservation (160 Virus Strains, Accession Names)\ parent cons160way\ shortLabel Multiz Alignments\ track cons160wayViewalign\ view align\ viewUi on\ visibility pack\ strainCons160wayViewalign Multiz Alignments bed 4 Multiz Alignment & Conservation (160 Virus Strains, Strain Names) 3 1 0 0 0 127 127 127 0 0 0 compGeno 1 longLabel Multiz Alignment & Conservation (160 Virus Strains, Strain Names)\ parent strainCons160way\ shortLabel Multiz Alignments\ track strainCons160wayViewalign\ view align\ viewUi on\ visibility pack\ ncbiGene NCBI Genes genePred NCBI Genes from KM034562 GenBank Record 3 1 12 12 120 133 133 187 0 0 0\ This track contains genes extracted from the GenBank nuccore entry for\ KM034562.1.\
\ \\ This track follows the display conventions for\ \ gene prediction tracks.\
\ \\ We downloaded the GenBank record for\ KM034562.1,\ extracted the entries for each gene and then loaded them into the UCSC database. Additional\ entries were added for the various forms of the GP gene.\
\ genes 1 baseColorDefault genomicCodons\ baseColorUseCds given\ color 12,12,120\ group genes\ itemDetailsHtmlTable geneDesc\ longLabel NCBI Genes from KM034562 GenBank Record\ priority 1\ shortLabel NCBI Genes\ track ncbiGene\ type genePred\ visibility pack\ newSeqs New sequences bed 4 Recent related sequences from GenBank 0 1 0 0 0 127 127 127 0 0 0\ This track shows the alignments of recently sequenced Ebola\ samples to the Ebola virus reference sequence G3683/KM034562.1.\
\ \\ Pairwise alignments of each species to the Ebola virus genome are\ displayed as a series of colored blocks indicating the functional effect of polymorphisms (in pack\ mode), or as a wiggle (in full mode) that indicates alignment quality.\ In dense display mode, percent identity of the whole alignments is shown in grayscale using\ darker values to indicate higher levels of identity.\
\ In pack mode, regions that align with 100% identity are not shown. When there is not 100% percent\ identity, blocks of four colors are drawn.\
\ Checkboxes on the track configuration page allow selection of the\ species to include in the pairwise display.\ Configuration buttons are available to select all of the species\ (Set all), deselect all of the species (Clear all), or\ use the default settings (Set defaults).\
\ To view detailed information about the alignments at a specific\ position, zoom the display in to 30,000 or fewer bases, then click on\ the alignment.
\ \\ When zoomed-in to the base-level display, the track shows the base\ composition of each alignment.\ The numbers and symbols on the Gaps\ line indicate the lengths of gaps in the Ebola virus sequence at those\ alignment positions relative to the longest non-Ebola virus sequence.\ If there is sufficient space in the display, the size of the gap is shown.\ If the space is insufficient and the gap size is a multiple of 3, a\ "*" is displayed; other gap sizes are indicated by "+".
\\ Codon translation is available in base-level display mode if the\ displayed region is identified as a coding segment. To display this annotation, select the species\ for translation from the pull-down menu in the Codon\ Translation configuration section at the top of the page. Then, select one of\ the following modes:\
\ Ebola sequences are found NCBI Nucleotide with the search term:\ (ebola[title] or ebolavirus[title]) and genome\
\\ The sequences are aligned to the reference sequence with an ordinary\ Smith-Waterman alignment command\ faAlign from the 'kent' source utilities.\
\ \\
psl score of alignments | |||||||||
---|---|---|---|---|---|---|---|---|---|
reference | chrStart | chrEnd | query | query size | score | identity | collection date | country | isolate |
KM034562v1 | 0 | 18957 | KP096422v1 | 18958 | 18941 | 100.00 | Mar-2014 | Guinea | H.sapiens-tc/GIN/14/WPG-C15 |
KM034562v1 | 0 | 18957 | KP178538v1 | 18958 | 18941 | 100.00 | 03-Aug-2014 | Liberia | Ebola virus/H.sapiens-wt/LBR/2014/Makona-201403007 |
KM034562v1 | 0 | 18957 | KP342330v1 | 18958 | 18941 | 100.00 | Oct-2014 | Guinea: Conacry | H.sapiens-wt/GIN/2014/Conacry-192 |
KM034562v1 | 0 | 18957 | KP096421v1 | 18958 | 18937 | 100.00 | Mar-2014 | Guinea | H.sapiens-tc/GIN/14/WPG-C07 |
KM034562v1 | 0 | 18957 | KP096420v1 | 18958 | 18935 | 100.00 | Mar-2014 | Guinea | H.sapiens-tc/GIN/14/WPG-C05 |
KM034562v1 | 0 | 18957 | KP260799v1 | 18958 | 18933 | 100.00 | 2014 | Mali | Ebola virus H.sapiens/MLI/14/Manoka-Mali-DPR1 |
KM034562v1 | 1 | 18957 | KP184503v1 | 18957 | 18932 | 100.00 | 25-Aug-2014 | UK: GB | Ebola virus /H.sapiens-tc/GBR/2014/Makona-UK1.1 |
KM034562v1 | 0 | 18957 | KP260800v1 | 18958 | 18927 | 100.00 | 2014 | Mali | Ebola virus H.sapiens/MLI/14/Manoka-Mali-DPR2 |
KM034562v1 | 0 | 18957 | KP260801v1 | 18958 | 18925 | 100.00 | 2014 | Mali | Ebola virus H.sapiens/MLI/14/Manoka-Mali-DPR3 |
KM034562v1 | 0 | 18957 | KP260802v1 | 18958 | 18923 | 100.00 | 2014 | Mali | Ebola virus H.sapiens/MLI/14/Manoka-Mali-DPR4 |
KM034562v1 | 36 | 18956 | KP120616v1 | 18920 | 18898 | 100.00 | 25-Aug-2014 | UK: GB | H.sapiens-wt/GBR/2014/ManoRiver-UK1 |
KM034562v1 | 29 | 18957 | KP658432v1 | 18929 | 18894 | 100.00 | 29-Dec-2014 | UK: GB | Ebola virus/H.sapiens-wt/GBR/2014/Makona-UK2 |
KM034562v1 | 3 | 18956 | KM519951v1 | 18953 | 17741 | 96.90 | 2014 | DRC | Ebola virus/H.sap-wt/COD/2014/Boende-Lokolia |
KM034562v1 | 16 | 18955 | KP271018v1 | 18941 | 17713 | 96.80 | 20-Aug-2014 | DRC | Ebola virus/H.sapiens/COD/2014/Lomela-Lokolia16 |
KM034562v1 | 45 | 18841 | KM655246v1 | 18797 | 17682 | 97.10 | 1976 | Zaire | H.sapiens-tc/COD/1976/Yambuku-Ecran |
KM034562v1 | 53 | 18914 | KP271020v1 | 18861 | 17635 | 96.80 | 20-Aug-2014 | DRC | Ebola virus/H.sapiens/COD/2014/Lomela-Lokolia19 |
KM034562v1 | 176 | 18936 | KP271019v1 | 18760 | 15340 | 96.70 | 20-Aug-2014 | DRC | Ebola virus/H.sapiens/COD/2014/Lomela-Lokolia17 |
KM034562v1 | 4 | 18336 | NC_016144v1 | 18927 | 3194 | 58.90 | 2003 | Spain | Lloviu virus |
\ This track shows protein sequence annotations from the UniProt/SwissProt database,\ mapped to genomic coordinates.\ The data has been curated from scientific publications by the UniProt staff.\
\ \\ Genomic locations of UniProt/SwissProt annotations are labeled with a short name\ for the type of annotation (e.g. "glyco", "disul" "signal pep" etc.). A click\ on them shows the full annotation and provides a link to the UniProt/SwissProt\ record for all details.
\ \\ Mouse over a mutation to see the UniProt comments.\
\ \\ Modified residues are highlighted in light blue, transmembrane regions in blue,\ glycosylation sites in yellow, disulfide bonds in grey, topological domains in\ red.\
\ \\ UniProt sequences were aligned to RefSeq sequences first with BLAT, then lifted\ to genome positions with pslMap. UniProt variants were parsed from the UniProt\ XML file. The variants were then mapped to the genome through the alignment\ using the pslMap program. This mapping approach\ draws heavily on the LS-SNP pipeline by Mark Diekhans. The complete script is\ part of the kent source tree and is located in src/hg/utils/uniprotMutations. \
\ \\ This track was created by Maximilian Haeussler, with advice from Mark Diekhans and Brian Raney.\ Thanks to UniProt for making all data available for download.\
\ \\ UniProt Consortium.\ Activities at the Universal Protein Resource (UniProt). \ Nucleic Acids Res. 2014 Jan;42(Database issue):D191-8. \ PMID: 24253303; PMC: PMC3965022\
\ \\ Yip YL, Scheib H, Diemand AV, Gattiker A, Famiglietti LM, Gasteiger E, Bairoch A.\ \ The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure\ information on human protein variants.\ Hum Mutat. 2004 May;23(5):464-70.\ PMID: 15108278\
\ genes 1 exonNumbers off\ group genes\ itemRgb on\ longLabel UniProt/SwissProt Protein Annotations\ mouseOverField comments\ noScoreFilter on\ parent spUniprot\ shortLabel UniProt Annot.\ track spAnnot\ type bigBed 12 +\ urls uniProtId="http://www.uniprot.org/uniprot/$$#section_features" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$"\ visibility hide\ iedbsupp2A02 A*02 bigBed 12 . IEDB Predicted binding Macaque HLA T-Cell Class I A*02 1 2 20 20 0 137 137 127 1 0 0 immu 1 longLabel IEDB Predicted binding Macaque HLA T-Cell Class I A*02\ parent iedbPredClassIMac\ shortLabel A*02\ track iedbsupp2A02\ type bigBed 12 .\ visibility dense\ iedbsupp1A0201 A*02:01 bigBed 12 . IEDB predicted binding Human HLA T-Cell Class I A*02:01 1 2 0 0 0 127 127 127 1 0 0 immu 1 longLabel IEDB predicted binding Human HLA T-Cell Class I A*02:01\ parent iedbPred1\ shortLabel A*02:01\ track iedbsupp1A0201\ type bigBed 12 .\ visibility dense\ patBulk Bulk patents bigBed 12 + Patent Lens Bulk patents 0 2 0 0 0 127 127 127 0 0 0 pub 1 group pub\ longLabel Patent Lens Bulk patents\ parent patSeq\ priority 2\ shortLabel Bulk patents\ track patBulk\ type bigBed 12 +\ visibility hide\ unipAliTrembl TrEMBL Aln. bigPsl UCSC alignment of TrEMBL proteins to genome 0 2 0 0 0 127 127 127 0 0 0 genes 1 baseColorDefault genomicCodons\ baseColorTickColor contrastingColor\ baseColorUseCds given\ bigDataUrl /gbdb/eboVir3/uniprot/unipAliTrembl.bb\ indelDoubleInsert on\ indelQueryInsert on\ itemRgb on\ labelFields name,acc,uniprotName,geneName,hgncSym,refSeq,refSeqProt,ensProt\ longLabel UCSC alignment of TrEMBL proteins to genome\ mouseOverField protFullNames\ parent uniprot off\ priority 2\ searchIndex name,acc\ shortLabel TrEMBL Aln.\ showDiffBasesAllScales on\ skipFields isMain\ track unipAliTrembl\ type bigPsl\ urls acc="https://www.uniprot.org/uniprot/$$" hgncId="https://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=$$" refseq="https://www.ncbi.nlm.nih.gov/nuccore/$$" refSeqProt="https://www.ncbi.nlm.nih.gov/protein/$$" ncbiGene="https://www.ncbi.nlm.nih.gov/gene/$$" entrezGene="https://www.ncbi.nlm.nih.gov/gene/$$" ensGene="https://www.ensembl.org/Gene/Summary?g=$$"\ visibility hide\ spStruct UniProt Structure bigBed 12 + UniProt/SwissProt Protein Primary/Secondary Structure Annotations 0 2 0 0 0 127 127 127 0 0 0\ This track shows the genomic positions of protein secondary structures and amino acid modifications\ in the UniProt/SwissProt database.\ These data have been curated from scientific publications by the UniProt staff.\
\ \\ Genomic locations of UniProt/SwissProt protein secondary structures and amino acid modifications\ are labeled with the feature name (e.g. helix, beta, coiled-coil, disulf bond, glyco) at a given\ position. \
\ \\ Mouse over a feature to see the UniProt comments.\
\ \\ UniProt sequences were aligned to RefSeq sequences first with BLAT, then lifted\ to genome positions with pslMap. UniProt protein secondary structures and amino acid modifications\ were parsed from the UniProt XML file. The features were then mapped to the genome through the\ alignment using the pslMap program. This mapping approach\ draws heavily on the LS-SNP pipeline by Mark Diekhans. The complete script is\ part of the kent source tree and is located in src/hg/utils/uniprotMutations.\
\ \\ This track was created by Maximilian Haeussler, with advice from Mark Diekhans and Brian Raney.\
\ \\ UniProt Consortium.\ Activities at the Universal Protein Resource (UniProt). \ Nucleic Acids Res. 2014 Jan;42(Database issue):D191-8. \ PMID: 24253303; PMC: PMC3965022\
\ \\ Yip YL, Scheib H, Diemand AV, Gattiker A, Famiglietti LM, Gasteiger E, Bairoch A.\ \ The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure\ information on human protein variants.\ Hum Mutat. 2004 May;23(5):464-70.\ PMID: 15108278\
\ genes 1 exonNumbers off\ group genes\ itemRgb on\ longLabel UniProt/SwissProt Protein Primary/Secondary Structure Annotations\ mouseOverField comments\ noScoreFilter on\ parent spUniprot\ shortLabel UniProt Structure\ track spStruct\ type bigBed 12 +\ urls uniProtId="http://www.uniprot.org/uniprot/$$#section_features" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$"\ visibility hide\ iedbsupp1A0203 A*02:03 bigBed 12 . IEDB predicted binding Human HLA T-Cell Class I A*02:03 1 3 0 0 0 127 127 127 1 0 0 immu 1 longLabel IEDB predicted binding Human HLA T-Cell Class I A*02:03\ parent iedbPred1\ shortLabel A*02:03\ track iedbsupp1A0203\ type bigBed 12 .\ visibility dense\ iedbsupp2A07 A*07 bigBed 12 . IEDB Predicted binding Macaque HLA T-Cell Class I A*07 1 3 20 20 0 137 137 127 1 0 0 immu 1 longLabel IEDB Predicted binding Macaque HLA T-Cell Class I A*07\ parent iedbPredClassIMac\ shortLabel A*07\ track iedbsupp2A07\ type bigBed 12 .\ visibility dense\ unipLocSignal Signal Peptide bigBed 12 + UniProt Signal Peptides 1 3 255 0 150 255 127 202 0 0 0 genes 1 bigDataUrl /gbdb/eboVir3/uniprot/unipLocSignal.bb\ color 255,0,150\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ itemRgb off\ longLabel UniProt Signal Peptides\ parent uniprot\ priority 3\ shortLabel Signal Peptide\ track unipLocSignal\ type bigBed 12 +\ visibility dense\ iedbsupp1A0206 A*02:06 bigBed 12 . IEDB predicted binding Human HLA T-Cell Class I A*02:06 1 4 0 0 0 127 127 127 1 0 0 immu 1 longLabel IEDB predicted binding Human HLA T-Cell Class I A*02:06\ parent iedbPred1\ shortLabel A*02:06\ track iedbsupp1A0206\ type bigBed 12 .\ visibility dense\ iedbsupp2A11 A*11 bigBed 12 . IEDB Predicted binding Macaque HLA T-Cell Class I A*11 1 4 20 20 0 137 137 127 1 0 0 immu 1 longLabel IEDB Predicted binding Macaque HLA T-Cell Class I A*11\ parent iedbPredClassIMac\ shortLabel A*11\ track iedbsupp2A11\ type bigBed 12 .\ visibility dense\ unipLocExtra Extracellular bigBed 12 + UniProt Extracellular Domain 1 4 0 150 255 127 202 255 0 0 0 genes 1 bigDataUrl /gbdb/eboVir3/uniprot/unipLocExtra.bb\ color 0,150,255\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ itemRgb off\ longLabel UniProt Extracellular Domain\ parent uniprot\ priority 4\ shortLabel Extracellular\ track unipLocExtra\ type bigBed 12 +\ visibility dense\ unipInterest Interest bigBed 12 + UniProt Regions of Interest 1 4 0 0 0 127 127 127 0 0 0 genes 1 bigDataUrl /gbdb/eboVir3/uniprot/unipInterest.bb\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ itemRgb off\ longLabel UniProt Regions of Interest\ parent uniprot\ priority 4\ shortLabel Interest\ track unipInterest\ type bigBed 12 +\ visibility dense\ phyloP160way PhyloP wig -4.711 0.934 158 Ebola strains and 2 Marburg strains Basewise Conservation by PhyloP 2 4 60 60 140 140 60 60 0 0 0 compGeno 0 altColor 140,60,60\ autoScale off\ color 60,60,140\ configurable on\ longLabel 158 Ebola strains and 2 Marburg strains Basewise Conservation by PhyloP\ maxHeightPixels 100:50:11\ noInherit on\ parent cons160wayViewphyloP on\ priority 4\ shortLabel PhyloP\ spanList 1\ subGroups view=phyloP\ track phyloP160way\ type wig -4.711 0.934\ viewLimits -3.107:0.934\ windowingFunction mean\ strainPhyloP160way PhyloP wig -4.711 0.934 158 Ebola strains and 2 Marburg strains Basewise Conservation by PhyloP 2 4 60 60 140 140 60 60 0 0 0 compGeno 0 altColor 140,60,60\ autoScale off\ color 60,60,140\ configurable on\ longLabel 158 Ebola strains and 2 Marburg strains Basewise Conservation by PhyloP\ maxHeightPixels 100:50:11\ noInherit on\ parent strainCons160wayViewphyloP on\ priority 4\ shortLabel PhyloP\ spanList 1\ subGroups view=phyloP\ track strainPhyloP160way\ type wig -4.711 0.934\ viewLimits -3.107:0.934\ windowingFunction mean\ iedbsupp1A0301 A*03:01 bigBed 12 . IEDB predicted binding Human HLA T-Cell Class I A*03:01 1 5 0 0 0 127 127 127 1 0 0 immu 1 longLabel IEDB predicted binding Human HLA T-Cell Class I A*03:01\ parent iedbPred1\ shortLabel A*03:01\ track iedbsupp1A0301\ type bigBed 12 .\ visibility dense\ iedbsupp2A102201 A1*02201 bigBed 12 . IEDB Predicted binding Macaque HLA T-Cell Class I A1*02201 1 5 20 20 0 137 137 127 1 0 0 immu 1 longLabel IEDB Predicted binding Macaque HLA T-Cell Class I A1*02201\ parent iedbPredClassIMac\ shortLabel A1*02201\ track iedbsupp2A102201\ type bigBed 12 .\ visibility dense\ unipLocTransMemb Transmembrane bigBed 12 + UniProt Transmembrane Domains 1 5 0 150 0 127 202 127 0 0 0 genes 1 bigDataUrl /gbdb/eboVir3/uniprot/unipLocTransMemb.bb\ color 0,150,0\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ itemRgb off\ longLabel UniProt Transmembrane Domains\ parent uniprot\ priority 5\ shortLabel Transmembrane\ track unipLocTransMemb\ type bigBed 12 +\ visibility dense\ iedbsupp1A1101 A*11:01 bigBed 12 . IEDB predicted binding Human HLA T-Cell Class I A*11:01 1 6 0 0 0 127 127 127 1 0 0 immu 1 longLabel IEDB predicted binding Human HLA T-Cell Class I A*11:01\ parent iedbPred1\ shortLabel A*11:01\ track iedbsupp1A1101\ type bigBed 12 .\ visibility dense\ iedbsupp2A102601 A1*02601 bigBed 12 . IEDB Predicted binding Macaque HLA T-Cell Class I A1*02601 1 6 20 20 0 137 137 127 1 0 0 immu 1 longLabel IEDB Predicted binding Macaque HLA T-Cell Class I A1*02601\ parent iedbPredClassIMac\ shortLabel A1*02601\ track iedbsupp2A102601\ type bigBed 12 .\ visibility dense\ unipLocCytopl Cytoplasmic bigBed 12 + UniProt Cytoplasmic Domains 1 6 255 150 0 255 202 127 0 0 0 genes 1 bigDataUrl /gbdb/eboVir3/uniprot/unipLocCytopl.bb\ color 255,150,0\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ itemRgb off\ longLabel UniProt Cytoplasmic Domains\ parent uniprot\ priority 6\ shortLabel Cytoplasmic\ track unipLocCytopl\ type bigBed 12 +\ visibility dense\ iedbsupp1A2301 A*23:01 bigBed 12 . IEDB predicted binding Human HLA T-Cell Class I A*23:01 1 7 0 0 0 127 127 127 1 0 0 immu 1 longLabel IEDB predicted binding Human HLA T-Cell Class I A*23:01\ parent iedbPred1\ shortLabel A*23:01\ track iedbsupp1A2301\ type bigBed 12 .\ visibility dense\ iedbsupp2A20102 A2*0102 bigBed 12 . IEDB Predicted binding Macaque HLA T-Cell Class I A2*0102 1 7 20 20 0 137 137 127 1 0 0 immu 1 longLabel IEDB Predicted binding Macaque HLA T-Cell Class I A2*0102\ parent iedbPredClassIMac\ shortLabel A2*0102\ track iedbsupp2A20102\ type bigBed 12 .\ visibility dense\ unipChain Chains bigBed 12 + UniProt Mature Protein Products (Polypeptide Chains) 1 7 0 0 0 127 127 127 0 0 0 genes 1 bigDataUrl /gbdb/eboVir3/uniprot/unipChain.bb\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ longLabel UniProt Mature Protein Products (Polypeptide Chains)\ parent uniprot\ priority 7\ shortLabel Chains\ track unipChain\ type bigBed 12 +\ urls uniProtId="http://www.uniprot.org/uniprot/$$#ptm_processing" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$"\ visibility dense\ iedbsupp1A2402 A*24:02 bigBed 12 . IEDB predicted binding Human HLA T-Cell Class I A*24:02 1 8 0 0 0 127 127 127 1 0 0 immu 1 longLabel IEDB predicted binding Human HLA T-Cell Class I A*24:02\ parent iedbPred1\ shortLabel A*24:02\ track iedbsupp1A2402\ type bigBed 12 .\ visibility dense\ iedbsupp2A70103 A7*0103 bigBed 12 . IEDB Predicted binding Macaque HLA T-Cell Class I A7*0103 1 8 20 20 0 137 137 127 1 0 0 immu 1 longLabel IEDB Predicted binding Macaque HLA T-Cell Class I A7*0103\ parent iedbPredClassIMac\ shortLabel A7*0103\ track iedbsupp2A70103\ type bigBed 12 .\ visibility dense\ unipDisulfBond Disulf. Bonds bigBed 12 + UniProt Disulfide Bonds 1 8 0 0 0 127 127 127 0 0 0 genes 1 bigDataUrl /gbdb/eboVir3/uniprot/unipDisulfBond.bb\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ longLabel UniProt Disulfide Bonds\ parent uniprot\ priority 8\ shortLabel Disulf. Bonds\ track unipDisulfBond\ type bigBed 12 +\ visibility dense\ unipDomain Domains bigBed 12 + UniProt Domains 1 8 0 0 0 127 127 127 0 0 0 genes 1 bigDataUrl /gbdb/eboVir3/uniprot/unipDomain.bb\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ longLabel UniProt Domains\ parent uniprot\ priority 8\ shortLabel Domains\ track unipDomain\ type bigBed 12 +\ urls uniProtId="http://www.uniprot.org/uniprot/$$#family_and_domains" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$"\ visibility dense\ iedbsupp1A2601 A*26:01 bigBed 12 . IEDB predicted binding Human HLA T-Cell Class I A*26:01 1 9 0 0 0 127 127 127 1 0 0 immu 1 longLabel IEDB predicted binding Human HLA T-Cell Class I A*26:01\ parent iedbPred1\ shortLabel A*26:01\ track iedbsupp1A2601\ type bigBed 12 .\ visibility dense\ unipModif AA Modifications bigBed 12 + UniProt Amino Acid Modifications 1 9 0 0 0 127 127 127 0 0 0 genes 1 bigDataUrl /gbdb/eboVir3/uniprot/unipModif.bb\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ longLabel UniProt Amino Acid Modifications\ parent uniprot\ priority 9\ shortLabel AA Modifications\ track unipModif\ type bigBed 12 +\ urls uniProtId="http://www.uniprot.org/uniprot/$$#aaMod_section" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$"\ visibility dense\ iedbsupp2B00101 B*00101 bigBed 12 . IEDB Predicted binding Macaque HLA T-Cell Class I B*00101 1 9 20 20 0 137 137 127 1 0 0 immu 1 longLabel IEDB Predicted binding Macaque HLA T-Cell Class I B*00101\ parent iedbPredClassIMac\ shortLabel B*00101\ track iedbsupp2B00101\ type bigBed 12 .\ visibility dense\ gire2014SpecificSnps 2014 Specific bigBed 9 + Sites that Carry a Unique Base in 81 Sequences from 2014 Outbreak 0 10 0 0 0 127 127 127 0 0 0\ This track displays variants identified by Gire et al. in 81 isolates\ from the 2014 strain of Ebola virus (78 from Sierra Leone and 3 from Guinea)\ that are specific to the 2014 strain. At least one 2014 isolate carries a base not found \ elsewhere in the EBOV alignment.\
\ \\ Items are labeled by ancestral nucleotide and 2014-derived nucleotide.\ Non-coding variants are black, synonymous variants are green,\ and missense variants are red and also labeled by gene and amino acid change.\ Click on an item to view more details such as the derived allele frequency\ in 2014 isolates and, for missense changes, the\ BLOSUM62 substitution score.\
\ \\ Blood samples were collected from 78 patients at Kenema Government Hospital\ in Sierra Leone. For details of RNA preservation, PCR, human RNA depletion,\ library construction and sequencing, see Supplemental Materials and Methods\ of Gire et al.\
\\ Gire et al. analyzed the 78 Sierra Leone patient sequences together with\ 3 sequences from the 2014 outbreak in Guinea (Baize et al.;\ suspected sequencing errors were masked, see Supplemental Materials and Methods of\ Gire et al.), for a total of 81 sequences from 2014.\ In addition, some analyses included 20 sequences from past outbreaks of Zaire Ebola\ virus, 1976-2008, for a total of 101 sequences. Sequence variants were extracted\ directly from multiple sequence alignments of the group of 101 sequences (1976-2014).\ A custom release of\ SnpEff\ (v4.0, build 2014-07-01, to support ribosomal slippage in transcription of GP gene)\ was used to predict functional effect of variants on genes (noncoding, synonymous\ or missense).\
\ \ \\ Baize S, Pannetier D, Oestereich L, Rieger T, Koivogui L, Magassouba N, Soropogui B, Sow MS,\ Keïta S, De Clerck H et al.\ \ Emergence of Zaire Ebola virus disease in Guinea.\ N Engl J Med. 2014 Oct 9;371(15):1418-25.\ PMID: 24738640\
\ \\
Gire SK, Goba A, Andersen KG, Sealfon RS, Park DJ, Kanneh L, Jalloh S, Momoh M,\
Fullah M, Dudas G et al.\
Genomic surveillance elucidates Ebola virus origin and transmission \
during the 2014 outbreak.\
Science 2014 Sep 12;345(6202):1369-72.\
PMID: 25214632\
\
Supplemental Materials and Methods\
\ This track shows protein sequences recognized by antibodies \ as annotated by the National Institute for Allergy and Infectious Diseases (NIAID) \ Immune Epitope Database (IEDB). \ Only sequences with a positive assay outcome are shown on the track. All fields annotated by\ IEDB and exported in their "compact" file are shown on the details page, which also \ provides links back to the IEDB.
\ \\ See also the detailed explanation of the curated Ebola data in the \ IEDB Knowledgebase.\
\ \\ Peptide matches are labeled with the name of the antibody and the host organism, separated by a \ slash.\ When the antibody does not have a name in IEDB, the first six letters of the peptide are shown \ instead.\
\ \\ Mouse over the features to see the authors of the study that described the epitope.\
\ \\ Vita R, Zarebski L, Greenbaum JA, Emami H, Hoof I, Salimi N, Damle R, Sette A, Peters B.\ \ The immune epitope database 2.0.\ Nucleic Acids Res. 2010 Jan;38(Database issue):D854-62.\ PMID: 19906713; PMC: PMC2808938\
\ immu 1 color 0,90,100\ group immu\ longLabel Immune Epitope Database and Analysis Resource (IEDB) B-Cell Epitopes\ mouseOverField author\ priority 10\ shortLabel IEDB B Cell\ track iedbBcell\ type bigBed 12 .\ urls pubMedID=https://www.ncbi.nlm.nih.gov/pubmed/$$ bCellID=http://www.iedb.org/assayId/$$ referenceID=http://www.iedb.org/refId/$$ epitopeID=http://www.iedb.org/epId/$$ epitopeSourceMoleculeAccession=https://www.ncbi.nlm.nih.gov/protein/$$ antigenSourceMoleculeAccession=https://www.ncbi.nlm.nih.gov/protein/$$\ visibility dense\ unipMut Mutations bigBed 12 + UniProt Amino Acid Mutations 1 10 0 0 0 127 127 127 0 0 0 genes 1 bigDataUrl /gbdb/eboVir3/uniprot/unipMut.bb\ longLabel UniProt Amino Acid Mutations\ parent uniprot\ priority 10\ shortLabel Mutations\ track unipMut\ type bigBed 12 +\ urls uniProtId="http://www.uniprot.org/uniprot/$$#pathology_and_biotech" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$" variationId="http://www.uniprot.org/uniprot/$$"\ visibility dense\ gire2014 2014 Variants vcfTabix Variants from 81 Sequences from 2014 Outbreak 0 11 0 0 0 127 127 127 0 0 0\ This track displays variants identified in 81 samples from the Zaire clade\ of Ebola viruses found by Gire et al., 2014.\
\ \\ In "dense" mode, a vertical line is drawn at the position of each variant.\ In "full" mode, in addition to the vertical line, a label to\ the left shows the reference allele first and variant alleles below\ (A = red, C = blue,\ G = green, T = magenta,\ Indels = black).\ Hovering the pointer over any variant will prompt the display of the occurrences numbers for each\ allele in Gire et al., 2014. Clicking on any variant will result in\ full details of that variant being displayed.
\\ By default, in "pack" mode, the\ display shows a clustering of haplotypes in the viewed range, sorted\ by similarity of alleles weighted by proximity to a central variant.\ The clustering view can highlight local patterns of linkage.
\\ Each variant is a vertical bar with white (invisible) representing the reference allele\ and black representing the non-reference allele(s).\ Tick marks are drawn at the top and bottom of each variant's vertical bar\ to make the bar more visible when most alleles are reference alleles.\ The vertical bar for the central variant used in clustering is outlined in purple.\ In order to avoid long compute times, the range of alleles used in clustering\ may be limited; alleles used in clustering have purple tick marks at the\ top and bottom.
\\ The clustering tree is displayed to the left of the main image.\ It does not represent relatedness of individuals; it simply shows the arrangement\ of local haplotypes by similarity. When a rightmost branch is purple, it means\ that all haplotypes in that branch are identical, at least within the range of\ variants used in clustering.
\ \\ Blood samples were collected from 78 patients at Kenema Government Hospital\ in Sierra Leone. For details of RNA preservation, PCR, human RNA depletion,\ library construction and sequencing, see Supplemental Materials and Methods\ of Gire et al.\
\\ Gire et al. analyzed the 78 Sierra Leone patient sequences together with\ 3 sequences from the 2014 outbreak in Guinea (Baize et al.;\ suspected sequencing errors were masked, see Supplemental Materials and Methods of\ Gire et al.), for a total of 81 sequences from 2014.\ In addition, some analyses included 20 sequences from past outbreaks of Zaire Ebola\ virus, 1976-2008, for a total of 101 sequences. Sequence variants were extracted\ directly from multiple sequence alignments of the group of 101 sequences (1976-2014).\ A custom release of\ SnpEff\ (v4.0, build 2014-07-01, to support ribosomal slippage in transcription of GP gene)\ was used to predict functional effect of variants on genes (noncoding, synonymous\ or missense).\
\ \ \\ Baize S, Pannetier D, Oestereich L, Rieger T, Koivogui L, Magassouba N, Soropogui B, Sow MS,\ Keïta S, De Clerck H et al.\ \ Emergence of Zaire Ebola virus disease in Guinea.\ N Engl J Med. 2014 Oct 9;371(15):1418-25.\ PMID: 24738640\
\ \\
Gire SK, Goba A, Andersen KG, Sealfon RS, Park DJ, Kanneh L, Jalloh S, Momoh M,\
Fullah M, Dudas G et al.\
Genomic surveillance elucidates Ebola virus origin and transmission \
during the 2014 outbreak.\
Science 2014 Sep 12;345(6202):1369-72.\
PMID: 25214632\
\
Supplemental Materials and Methods\
\ This track displays protein changing variants identified in 81 samples from the Zaire clade of Ebola viruses found by Gire et al., 2014.\
\ \\ In "dense" mode, a vertical line is drawn at the position of each variant.\ In "full" mode, in addition to the vertical line, a label to\ the left shows the reference allele first and variant alleles below\ (A = red, C = blue,\ G = green, T = magenta,\ Indels = black).\ Hovering the pointer over any variant will prompt the display of the occurrences numbers for each\ allele in Gire et al., 2014. Clicking on any variant will result in\ full details of that variant being displayed.
\\ By default, in "pack" mode, the\ display shows a clustering of haplotypes in the viewed range, sorted\ by similarity of alleles weighted by proximity to a central variant.\ The clustering view can highlight local patterns of linkage.
\\ Each variant is a vertical bar with white (invisible) representing the reference allele\ and black representing the non-reference allele(s).\ Tick marks are drawn at the top and bottom of each variant's vertical bar\ to make the bar more visible when most alleles are reference alleles.\ The vertical bar for the central variant used in clustering is outlined in purple.\ In order to avoid long compute times, the range of alleles used in clustering\ may be limited; alleles used in clustering have purple tick marks at the\ top and bottom.
\\ The clustering tree is displayed to the left of the main image.\ It does not represent relatedness of individuals; it simply shows the arrangement\ of local haplotypes by similarity. When a rightmost branch is purple, it means\ that all haplotypes in that branch are identical, at least within the range of\ variants used in clustering.\
\ \\ Blood samples were collected from 78 patients at Kenema Government Hospital\ in Sierra Leone. For details of RNA preservation, PCR, human RNA depletion,\ library construction and sequencing, see Supplemental Materials and Methods\ of Gire et al.\
\\ Gire et al. analyzed the 78 Sierra Leone patient sequences together with\ 3 sequences from the 2014 outbreak in Guinea (Baize et al.;\ suspected sequencing errors were masked, see Supplemental Materials and Methods of\ Gire et al.), for a total of 81 sequences from 2014.\ In addition, some analyses included 20 sequences from past outbreaks of Zaire Ebola\ virus, 1976-2008, for a total of 101 sequences. Sequence variants were extracted\ directly from multiple sequence alignments of the group of 101 sequences (1976-2014).\ A custom release of\ SnpEff\ (v4.0, build 2014-07-01, to support ribosomal slippage in transcription of GP gene)\ was used to predict functional effect of variants on genes (noncoding, synonymous\ or missense).\
\ \ \\ Baize S, Pannetier D, Oestereich L, Rieger T, Koivogui L, Magassouba N, Soropogui B, Sow MS,\ Keïta S, De Clerck H et al.\ \ Emergence of Zaire Ebola virus disease in Guinea.\ N Engl J Med. 2014 Oct 9;371(15):1418-25.\ PMID: 24738640\
\ \\
Gire SK, Goba A, Andersen KG, Sealfon RS, Park DJ, Kanneh L, Jalloh S, Momoh M,\
Fullah M, Dudas G et al.\
Genomic surveillance elucidates Ebola virus origin and transmission \
during the 2014 outbreak.\
Science 2014 Sep 12;345(6202):1369-72.\
PMID: 25214632\
\
Supplemental Materials and Methods\
\ This track displays variants identified in 101 samples from the Zaire clade of Ebola \ viruses found by Gire et al.\
\ \\ In "dense" mode, a vertical line is drawn at the position of each variant.\ In "full" mode, in addition to the vertical line, a label to\ the left shows the reference allele first and variant alleles below\ (A = red, C = blue,\ G = green, T = magenta,\ Indels = black).\ Hovering the pointer over any variant will prompt the display of the occurrences numbers for each\ allele in Gire et al., 2014. Clicking on any variant will result in\ full details of that variant being displayed.
\\ By default, in "pack" mode, the\ display shows a clustering of haplotypes in the viewed range, sorted\ by similarity of alleles weighted by proximity to a central variant.\ The clustering view can highlight local patterns of linkage.
\\ Each variant is a vertical bar with white (invisible) representing the reference allele\ and black representing the non-reference allele(s).\ Tick marks are drawn at the top and bottom of each variant's vertical bar\ to make the bar more visible when most alleles are reference alleles.\ The vertical bar for the central variant used in clustering is outlined in purple.\ In order to avoid long compute times, the range of alleles used in clustering\ may be limited; alleles used in clustering have purple tick marks at the\ top and bottom.
\\ The clustering tree is displayed to the left of the main image.\ It does not represent relatedness of individuals; it simply shows the arrangement\ of local haplotypes by similarity. When a rightmost branch is purple, it means\ that all haplotypes in that branch are identical, at least within the range of\ variants used in clustering.
\ \\ Blood samples were collected from 78 patients at Kenema Government Hospital\ in Sierra Leone. For details of RNA preservation, PCR, human RNA depletion,\ library construction and sequencing, see Supplemental Materials and Methods\ of Gire et al.\
\\ Gire et al. analyzed the 78 Sierra Leone patient sequences together with\ 3 sequences from the 2014 outbreak in Guinea (Baize et al.;\ suspected sequencing errors were masked, see Supplemental Materials and Methods of\ Gire et al.), for a total of 81 sequences from 2014.\ In addition, some analyses included 20 sequences from past outbreaks of Zaire Ebola\ virus, 1976-2008, for a total of 101 sequences. Sequence variants were extracted\ directly from multiple sequence alignments of the group of 101 sequences (1976-2014).\ A custom release of\ SnpEff\ (v4.0, build 2014-07-01, to support ribosomal slippage in transcription of GP gene)\ was used to predict functional effect of variants on genes (noncoding, synonymous\ or missense).\
\ \ \\ Baize S, Pannetier D, Oestereich L, Rieger T, Koivogui L, Magassouba N, Soropogui B, Sow MS,\ Keïta S, De Clerck H et al.\ \ Emergence of Zaire Ebola virus disease in Guinea.\ N Engl J Med. 2014 Oct 9;371(15):1418-25.\ PMID: 24738640\
\ \\
Gire SK, Goba A, Andersen KG, Sealfon RS, Park DJ, Kanneh L, Jalloh S, Momoh M,\
Fullah M, Dudas G et al.\
Genomic surveillance elucidates Ebola virus origin and transmission \
during the 2014 outbreak.\
Science 2014 Sep 12;345(6202):1369-72.\
PMID: 25214632\
\
Supplemental Materials and Methods\
\ This track displays protein changing variants identified in 101 samples from the Zaire clade of\ Ebola viruses found by Gire et al., 2014.\
\ \\ In "dense" mode, a vertical line is drawn at the position of each variant.\ In "full" mode, in addition to the vertical line, a label to\ the left shows the reference allele first and variant alleles below\ (A = red, C = blue,\ G = green, T = magenta,\ Indels = black).\ Hovering the pointer over any variant will prompt the display of the occurrences numbers for each\ allele in Gire et al., 2014. Clicking on any variant will result in\ full details of that variant being displayed.
\\ By default, in "pack" mode, the\ display shows a clustering of haplotypes in the viewed range, sorted\ by similarity of alleles weighted by proximity to a central variant.\ The clustering view can highlight local patterns of linkage.
\\ Each variant is a vertical bar with white (invisible) representing the reference allele\ and black representing the non-reference allele(s).\ Tick marks are drawn at the top and bottom of each variant's vertical bar\ to make the bar more visible when most alleles are reference alleles.\ The vertical bar for the central variant used in clustering is outlined in purple.\ In order to avoid long compute times, the range of alleles used in clustering\ may be limited; alleles used in clustering have purple tick marks at the\ top and bottom.
\\ The clustering tree is displayed to the left of the main image.\ It does not represent relatedness of individuals; it simply shows the arrangement\ of local haplotypes by similarity. When a rightmost branch is purple, it means\ that all haplotypes in that branch are identical, at least within the range of\ variants used in clustering.
\ \\ Blood samples were collected from 78 patients at Kenema Government Hospital\ in Sierra Leone. For details of RNA preservation, PCR, human RNA depletion,\ library construction and sequencing, see Supplemental Materials and Methods\ of Gire et al.\
\\ Gire et al. analyzed the 78 Sierra Leone patient sequences together with\ 3 sequences from the 2014 outbreak in Guinea (Baize et al.;\ suspected sequencing errors were masked, see Supplemental Materials and Methods of\ Gire et al.), for a total of 81 sequences from 2014.\ In addition, some analyses included 20 sequences from past outbreaks of Zaire Ebola\ virus, 1976-2008, for a total of 101 sequences. Sequence variants were extracted\ directly from multiple sequence alignments of the group of 101 sequences (1976-2014).\ A custom release of\ SnpEff\ (v4.0, build 2014-07-01, to support ribosomal slippage in transcription of GP gene)\ was used to predict functional effect of variants on genes (noncoding, synonymous\ or missense).\
\ \ \\ Baize S, Pannetier D, Oestereich L, Rieger T, Koivogui L, Magassouba N, Soropogui B, Sow MS,\ Keïta S, De Clerck H et al.\ \ Emergence of Zaire Ebola virus disease in Guinea.\ N Engl J Med. 2014 Oct 9;371(15):1418-25.\ PMID: 24738640\
\ \\
Gire SK, Goba A, Andersen KG, Sealfon RS, Park DJ, Kanneh L, Jalloh S, Momoh M,\
Fullah M, Dudas G et al.\
Genomic surveillance elucidates Ebola virus origin and transmission \
during the 2014 outbreak.\
Science 2014 Sep 12;345(6202):1369-72.\
PMID: 25214632\
\
Supplemental Materials and Methods\
\ This track represents intrahost variants for a subset of the 78 Sierra Leone EVD patients described\ in Gire et al., 2014. For a subset of these 78 patients, researchers took multiple blood\ samples from each patient at various time points, then isolated and sequenced the Ebola virus from\ each sample. This allowed them to to track the mutation and evolution of the virus within a single\ patient or host.\
\ \\ In "dense" mode, a vertical line is drawn at the position of each variant.\ In "full" mode, in addition to the vertical line, a label to\ the left shows the reference allele first and variant alleles below\ (A = red, C = blue,\ G = green, T = magenta,\ Indels = black).\ Hovering the pointer over any variant will prompt the display of the occurrences numbers for each\ allele in Gire et al., 2014. Clicking on any variant will result in\ full details of that variant being displayed.
\\ By default, in "pack" mode, the\ display shows a clustering of haplotypes in the viewed range, sorted\ by similarity of alleles weighted by proximity to a central variant.\ The clustering view can highlight local patterns of linkage.
\\ Each variant is a vertical bar with white (invisible) representing the reference allele\ and black representing the non-reference allele(s).\ Tick marks are drawn at the top and bottom of each variant's vertical bar\ to make the bar more visible when most alleles are reference alleles.\ The vertical bar for the central variant used in clustering is outlined in purple.\ In order to avoid long compute times, the range of alleles used in clustering\ may be limited; alleles used in clustering have purple tick marks at the\ top and bottom.
\\ The clustering tree is displayed to the left of the main image.\ It does not represent relatedness of individuals; it simply shows the arrangement\ of local haplotypes by similarity. When a rightmost branch is purple, it means\ that all haplotypes in that branch are identical, at least within the range of\ variants used in clustering.\
\ \\ Blood samples were collected from 78 patients at Kenema Government Hospital\ in Sierra Leone. For details of RNA preservation, PCR, human RNA depletion,\ library construction and sequencing, see Supplemental Materials and Methods\ of Gire et al.\
\\ Gire et al. analyzed the 78 Sierra Leone patient sequences together with\ 3 sequences from the 2014 outbreak in Guinea (Baize et al.;\ suspected sequencing errors were masked, see Supplemental Materials and Methods of\ Gire et al.), for a total of 81 sequences from 2014.\ In addition, some analyses included 20 sequences from past outbreaks of Zaire Ebola\ virus, 1976-2008, for a total of 101 sequences. Sequence variants were extracted\ directly from multiple sequence alignments of the group of 101 sequences (1976-2014).\ A custom release of\ SnpEff\ (v4.0, build 2014-07-01, to support ribosomal slippage in transcription of GP gene)\ was used to predict functional effect of variants on genes (noncoding, synonymous\ or missense).\
\ \ \\ Baize S, Pannetier D, Oestereich L, Rieger T, Koivogui L, Magassouba N, Soropogui B, Sow MS,\ Keïta S, De Clerck H et al.\ \ Emergence of Zaire Ebola virus disease in Guinea.\ N Engl J Med. 2014 Oct 9;371(15):1418-25.\ PMID: 24738640\
\ \\
Gire SK, Goba A, Andersen KG, Sealfon RS, Park DJ, Kanneh L, Jalloh S, Momoh M,\
Fullah M, Dudas G et al.\
Genomic surveillance elucidates Ebola virus origin and transmission \
during the 2014 outbreak.\
Science 2014 Sep 12;345(6202):1369-72.\
PMID: 25214632\
\
Supplemental Materials and Methods\
\ This track displays variants identified by Gire et al. in 81 isolates\ from the 2014 strain of Ebola virus (78 from Sierra Leone and 3 from Guinea)\ that are specific to the 2014 strain, fixed (i.e. present in all 81 isolates),\ and that cause protein-coding changes.\
\ \\ Items are labeled by gene and amino acid change. The labels of two items also contain "*CONS"\ to indicate that the changed position is fully conserved across the Ebola genus.\ Click on an item to view the\ BLOSUM62 substitution score.\
\ \\ Blood samples were collected from 78 patients at Kenema Government Hospital\ in Sierra Leone. For details of RNA preservation, PCR, human RNA depletion,\ library construction and sequencing, see Supplemental Materials and Methods\ of Gire et al.\
\\ Gire et al. analyzed the 78 Sierra Leone patient sequences together with\ 3 sequences from the 2014 outbreak in Guinea (Baize et al.;\ suspected sequencing errors were masked, see Supplemental Materials and Methods of\ Gire et al.), for a total of 81 sequences from 2014.\ In addition, some analyses included 20 sequences from past outbreaks of Zaire Ebola\ virus, 1976-2008, for a total of 101 sequences. Sequence variants were extracted\ directly from multiple sequence alignments of the group of 101 sequences (1976-2014).\ A custom release of\ SnpEff\ (v4.0, build 2014-07-01, to support ribosomal slippage in transcription of GP gene)\ was used to predict functional effect of variants on genes (noncoding, synonymous\ or missense).\
\ \ \\ Baize S, Pannetier D, Oestereich L, Rieger T, Koivogui L, Magassouba N, Soropogui B, Sow MS,\ Keïta S, De Clerck H et al.\ \ Emergence of Zaire Ebola virus disease in Guinea.\ N Engl J Med. 2014 Oct 9;371(15):1418-25.\ PMID: 24738640\
\ \\
Gire SK, Goba A, Andersen KG, Sealfon RS, Park DJ, Kanneh L, Jalloh S, Momoh M,\
Fullah M, Dudas G et al.\
Genomic surveillance elucidates Ebola virus origin and transmission \
during the 2014 outbreak.\
Science 2014 Sep 12;345(6202):1369-72.\
PMID: 25214632\
\
Supplemental Materials and Methods\
\ This track displays variants identified by Gire et al. in 81 isolates\ from the 2014 strain of Ebola virus (78 from Sierra Leone and 3 from Guinea)\ that are specific to the 2014 strain, polymorphic in 2014 isolates,\ and that cause protein-coding changes.\
\ \\ Items are labeled by gene and amino acid change. The labels of two items also contain "*CONS"\ to indicate that the changed position is fully conserved across the Ebola genus.\ Click on an item to view the\ BLOSUM62 substitution score.\
\ \\ Blood samples were collected from 78 patients at Kenema Government Hospital\ in Sierra Leone. For details of RNA preservation, PCR, human RNA depletion,\ library construction and sequencing, see Supplemental Materials and Methods\ of Gire et al.\
\\ Gire et al. analyzed the 78 Sierra Leone patient sequences together with\ 3 sequences from the 2014 outbreak in Guinea (Baize et al.;\ suspected sequencing errors were masked, see Supplemental Materials and Methods of\ Gire et al.), for a total of 81 sequences from 2014.\ In addition, some analyses included 20 sequences from past outbreaks of Zaire Ebola\ virus, 1976-2008, for a total of 101 sequences. Sequence variants were extracted\ directly from multiple sequence alignments of the group of 101 sequences (1976-2014).\ A custom release of\ SnpEff\ (v4.0, build 2014-07-01, to support ribosomal slippage in transcription of GP gene)\ was used to predict functional effect of variants on genes (noncoding, synonymous\ or missense).\
\ \ \\ Baize S, Pannetier D, Oestereich L, Rieger T, Koivogui L, Magassouba N, Soropogui B, Sow MS,\ Keïta S, De Clerck H et al.\ \ Emergence of Zaire Ebola virus disease in Guinea.\ N Engl J Med. 2014 Oct 9;371(15):1418-25.\ PMID: 24738640\
\ \\
Gire SK, Goba A, Andersen KG, Sealfon RS, Park DJ, Kanneh L, Jalloh S, Momoh M,\
Fullah M, Dudas G et al.\
Genomic surveillance elucidates Ebola virus origin and transmission \
during the 2014 outbreak.\
Science 2014 Sep 12;345(6202):1369-72.\
PMID: 25214632\
\
Supplemental Materials and Methods\
\ This track displays variants identified by Gire et al. in 81 isolates\ from the 2014 strain of Ebola virus (78 from Sierra Leone and 3 from Guinea)\ relative to a Kikwit isolate\ (JQ352763)\ that cause protein-coding changes in the GP gene.\
\ \\ Items are labeled by amino acid change and the frequency of the change in 2014 isolates.\
\ \\ Blood samples were collected from 78 patients at Kenema Government Hospital\ in Sierra Leone. For details of RNA preservation, PCR, human RNA depletion,\ library construction and sequencing, see Supplemental Materials and Methods\ of Gire et al.\
\\ Gire et al. analyzed the 78 Sierra Leone patient sequences together with\ 3 sequences from the 2014 outbreak in Guinea (Baize et al.;\ suspected sequencing errors were masked, see Supplemental Materials and Methods of\ Gire et al.), for a total of 81 sequences from 2014.\ In addition, some analyses included 20 sequences from past outbreaks of Zaire Ebola\ virus, 1976-2008, for a total of 101 sequences. Sequence variants were extracted\ directly from multiple sequence alignments of the group of 101 sequences (1976-2014).\ A custom release of\ SnpEff\ (v4.0, build 2014-07-01, to support ribosomal slippage in transcription of GP gene)\ was used to predict functional effect of variants on genes (noncoding, synonymous\ or missense).\
\ \ \\ Baize S, Pannetier D, Oestereich L, Rieger T, Koivogui L, Magassouba N, Soropogui B, Sow MS,\ Keïta S, De Clerck H et al.\ \ Emergence of Zaire Ebola virus disease in Guinea.\ N Engl J Med. 2014 Oct 9;371(15):1418-25.\ PMID: 24738640\
\ \\
Gire SK, Goba A, Andersen KG, Sealfon RS, Park DJ, Kanneh L, Jalloh S, Momoh M,\
Fullah M, Dudas G et al.\
Genomic surveillance elucidates Ebola virus origin and transmission \
during the 2014 outbreak.\
Science 2014 Sep 12;345(6202):1369-72.\
PMID: 25214632\
\
Supplemental Materials and Methods\
\ This track displays variants identified by Gire et al. in 81 isolates\ from the 2014 strain of Ebola virus (78 from Sierra Leone and 3 from Guinea)\ relative to a Mayinga isolate\ (NC002549)\ that cause protein-coding changes in the GP gene.\
\ \\ Items are labeled by amino acid change and the frequency of the change in 2014 isolates.\
\ \\ Blood samples were collected from 78 patients at Kenema Government Hospital\ in Sierra Leone. For details of RNA preservation, PCR, human RNA depletion,\ library construction and sequencing, see Supplemental Materials and Methods\ of Gire et al.\
\\ Gire et al. analyzed the 78 Sierra Leone patient sequences together with\ 3 sequences from the 2014 outbreak in Guinea (Baize et al.;\ suspected sequencing errors were masked, see Supplemental Materials and Methods of\ Gire et al.), for a total of 81 sequences from 2014.\ In addition, some analyses included 20 sequences from past outbreaks of Zaire Ebola\ virus, 1976-2008, for a total of 101 sequences. Sequence variants were extracted\ directly from multiple sequence alignments of the group of 101 sequences (1976-2014).\ A custom release of\ SnpEff\ (v4.0, build 2014-07-01, to support ribosomal slippage in transcription of GP gene)\ was used to predict functional effect of variants on genes (noncoding, synonymous\ or missense).\
\ \ \\ Baize S, Pannetier D, Oestereich L, Rieger T, Koivogui L, Magassouba N, Soropogui B, Sow MS,\ Keïta S, De Clerck H et al.\ \ Emergence of Zaire Ebola virus disease in Guinea.\ N Engl J Med. 2014 Oct 9;371(15):1418-25.\ PMID: 24738640\
\ \\
Gire SK, Goba A, Andersen KG, Sealfon RS, Park DJ, Kanneh L, Jalloh S, Momoh M,\
Fullah M, Dudas G et al.\
Genomic surveillance elucidates Ebola virus origin and transmission \
during the 2014 outbreak.\
Science 2014 Sep 12;345(6202):1369-72.\
PMID: 25214632\
\
Supplemental Materials and Methods\
\ This track shows protein sequences assayed but not recognized by antibodies.\ They were annotated by the National Institute for Allergy and Infectious Diseases (NIAID) \ Immune Epitope Database (IEDB). \ Only sequences with at least one negative assay outcome are shown on the track.\ All fields annotated by IEDB and exported in their "compact" file are shown on\ the details page, which also provides links back to the IEDB.
\ \\ See also the detailed explanation of the curated Ebola data in the \ IEDB Knowledgebase.\
\ \\ Peptide matches are labeled with the name of the antibody and the host organism, separated by a \ slash.\ When the antibody does not have a name in IEDB, the first six letters of the peptide are shown \ instead.\
\ \\ Mouse over the features to see the authors of the study that described the epitope.\
\ \\ Vita R, Zarebski L, Greenbaum JA, Emami H, Hoof I, Salimi N, Damle R, Sette A, Peters B.\ \ The immune epitope database 2.0.\ Nucleic Acids Res. 2010 Jan;38(Database issue):D854-62.\ PMID: 19906713; PMC: PMC2808938\
\ immu 1 color 0,90,100\ group immu\ longLabel Immune Epitope Database and Analysis Resource (IEDB) B-Cell Epitopes with Negative Assay Result\ mouseOverField author\ priority 20\ shortLabel IEDB B Cell Neg\ track iedbBcellNeg\ type bigBed 12 .\ urls pubMedID=https://www.ncbi.nlm.nih.gov/pubmed/$$ bCellID=http://www.iedb.org/assayId/$$ referenceID=http://www.iedb.org/refId/$$ epitopeID=http://www.iedb.org/epId/$$ epitopeSourceMoleculeAccession=https://www.ncbi.nlm.nih.gov/protein/$$ antigenSourceMoleculeAccession=https://www.ncbi.nlm.nih.gov/protein/$$\ visibility hide\ iedbsupp1B4001 B*40:01 bigBed 12 . IEDB predicted binding Human HLA T-Cell Class I B*40:01 1 21 0 0 0 127 127 127 1 0 0 immu 1 longLabel IEDB predicted binding Human HLA T-Cell Class I B*40:01\ parent iedbPred1\ shortLabel B*40:01\ track iedbsupp1B4001\ type bigBed 12 .\ visibility dense\ iedbsupp1B4402 B*44:02 bigBed 12 . IEDB predicted binding Human HLA T-Cell Class I B*44:02 1 22 0 0 0 127 127 127 1 0 0 immu 1 longLabel IEDB predicted binding Human HLA T-Cell Class I B*44:02\ parent iedbPred1\ shortLabel B*44:02\ track iedbsupp1B4402\ type bigBed 12 .\ visibility dense\ iedbsupp1B4403 B*44:03 bigBed 12 . IEDB predicted binding Human HLA T-Cell Class I B*44:03 1 23 0 0 0 127 127 127 1 0 0 immu 1 longLabel IEDB predicted binding Human HLA T-Cell Class I B*44:03\ parent iedbPred1\ shortLabel B*44:03\ track iedbsupp1B4403\ type bigBed 12 .\ visibility dense\ phastConsElements160way Cons. Elements bed 5 . 158 Ebola strains and 2 Marburg strains Conserved Elements 1 23 110 10 40 182 132 147 0 0 0 compGeno 1 color 110,10,40\ longLabel 158 Ebola strains and 2 Marburg strains Conserved Elements\ noInherit on\ parent cons160wayViewelements on\ priority 23\ shortLabel Cons. Elements\ subGroups view=elements\ track phastConsElements160way\ type bed 5 .\ strainPhastConsElements160way Cons. Elements bed 5 . 158 Ebola strains and 2 Marburg strains Conserved Elements 1 23 110 10 40 182 132 147 0 0 0 compGeno 1 color 110,10,40\ longLabel 158 Ebola strains and 2 Marburg strains Conserved Elements\ noInherit on\ parent strainCons160wayViewelements on\ priority 23\ shortLabel Cons. Elements\ subGroups view=elements\ track strainPhastConsElements160way\ type bed 5 .\ iedbsupp1B5101 B*51:01 bigBed 12 . IEDB predicted binding Human HLA T-Cell Class I B*51:01 1 24 0 0 0 127 127 127 1 0 0 immu 1 longLabel IEDB predicted binding Human HLA T-Cell Class I B*51:01\ parent iedbPred1\ shortLabel B*51:01\ track iedbsupp1B5101\ type bigBed 12 .\ visibility dense\ iedbsupp1B5301 B*53:01 bigBed 12 . IEDB predicted binding Human HLA T-Cell Class I B*53:01 1 25 0 0 0 127 127 127 1 0 0 immu 1 longLabel IEDB predicted binding Human HLA T-Cell Class I B*53:01\ parent iedbPred1 on\ shortLabel B*53:01\ track iedbsupp1B5301\ type bigBed 12 .\ visibility dense\ iedbsupp1B5701 B*57:01 bigBed 12 . IEDB predicted binding Human HLA T-Cell Class I B*57:01 1 26 0 0 0 127 127 127 1 0 0 immu 1 longLabel IEDB predicted binding Human HLA T-Cell Class I B*57:01\ parent iedbPred1\ shortLabel B*57:01\ track iedbsupp1B5701\ type bigBed 12 .\ visibility dense\ iedbsupp1B5801 B*58:01 bigBed 12 . IEDB predicted binding Human HLA T-Cell Class I B*58:01 1 27 0 0 0 127 127 127 1 0 0 immu 1 longLabel IEDB predicted binding Human HLA T-Cell Class I B*58:01\ parent iedbPred1\ shortLabel B*58:01\ track iedbsupp1B5801\ type bigBed 12 .\ visibility dense\ iedbTcellI IEDB T Cell I bigBed 12 . Immune Epitope Database and Analysis Resource (IEDB) Curated T-Cell Epitopes, MHC Class I 0 30 0 90 100 127 172 177 0 0 0\ This track shows protein sequences displayed by virus-infected cells to T-cells\ as annotated by the National Institute for Allergy and Infectious Diseases (NIAID) \ Immune Epitope Database (IEDB). \ Only sequences with a positive assay outcome are shown on the track. All fields annotated by\ IEDB and exported in their "compact" file are shown on the details page, which also \ provides links back to the IEDB.
\ \\ See also the detailed explanation of the curated Ebola data in the \ IEDB Knowledgebase.\
\ \\ To enrich for MHC Class I epitopes, this track shows epitopes with annotated\ Class I alleles or, if the allele was not annotated, alleles with less than or\ equal to 12 amino acids.
\ \\ Matching peptide sequences are shown. Mouse over the features to see the\ exact MHC allele, if annotated.\
\ \\ Vita R, Zarebski L, Greenbaum JA, Emami H, Hoof I, Salimi N, Damle R, Sette A, Peters B.\ \ The immune epitope database 2.0.\ Nucleic Acids Res. 2010 Jan;38(Database issue):D854-62.\ PMID: 19906713; PMC: PMC2808938\
\ immu 1 color 0,90,100\ group immu\ longLabel Immune Epitope Database and Analysis Resource (IEDB) Curated T-Cell Epitopes, MHC Class I\ mouseOverField mHCAlleleName\ priority 30\ shortLabel IEDB T Cell I\ track iedbTcellI\ type bigBed 12 .\ urls pubMedID=https://www.ncbi.nlm.nih.gov/pubmed/$$ bCellID=http://www.iedb.org/assayId/$$ referenceID=http://www.iedb.org/refId/$$ epitopeID=http://www.iedb.org/epId/$$ epitopeSourceMoleculeAccession=https://www.ncbi.nlm.nih.gov/protein/$$ antigenSourceMoleculeAccession=https://www.ncbi.nlm.nih.gov/protein/$$\ visibility hide\ pdb PDB bed 12 Protein Data Bank (PDB) Sequence Matches 0 30 0 0 0 127 127 127 0 0 0This track shows alignments of sequences with known protein structures in the \ Protein Data Bank (PDB).\
\ \\ Genomic locations of PDB matches are labeled with the accession number. \ A click on them shows a standard feature detail page with the PDB page integrated into it. \ The protein structure is shown on the PDB page.\
\ \\ PDB sequences were downloaded from the \ PDB website and aligned \ with BLAST (tblastn). Only alignments with a minimum identity of 80%\ that span at least 80% of the query sequence were kept.\
\ \\ Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE.\ \ The Protein Data Bank.\ Nucleic Acids Res. 2000 Jan 1;28(1):235-42.\ PMID: 10592235; PMC: PMC102472\
\ \\ Yip YL, Scheib H, Diemand AV, Gattiker A, Famiglietti LM, Gasteiger E, Bairoch A.\ \ The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure\ information on human protein variants.\ Hum Mutat. 2004 May;23(5):464-70.\ PMID: 15108278\
\ genes 1 group genes\ iframeOptions width='1000' height='800' scrolling='yes'\ iframeUrl http://www.pdb.org/pdb/explore/explore.do?structureId=$$\ longLabel Protein Data Bank (PDB) Sequence Matches\ priority 30\ shortLabel PDB\ track pdb\ type bed 12\ visibility hide\ iedbTcellII IEDB T Cell II bigBed 12 . Immune Epitope Database and Analysis Resource (IEDB) Curated T-Cell Epitopes, MHC Class II 0 40 0 90 100 127 172 177 0 0 0\ This track shows protein sequences displayed by virus-infected cells to T-cells\ as annotated by the National Institute for Allergy and Infectious Diseases (NIAID) \ Immune Epitope Database (IEDB). \ Only sequences with a positive assay outcome are shown on the track. All fields annotated by\ IEDB and exported in their "compact" file are shown on the details page, which also \ provides links back to the IEDB.
\ \\ See also the detailed explanation of the curated Ebola data in the \ IEDB Knowledgebase.\
\ \\ To enrich for MHC Class II epitopes, this track shows epitopes with annotated Class II alleles or, \ if the allele was not annotated, alleles with more than 12 amino acids.\
\ \\ Matching peptide sequences are shown. Mouse over the features to see the \ exact MHC allele, if annotated.\
\ \\ Vita R, Zarebski L, Greenbaum JA, Emami H, Hoof I, Salimi N, Damle R, Sette A, Peters B.\ \ The immune epitope database 2.0.\ Nucleic Acids Res. 2010 Jan;38(Database issue):D854-62.\ PMID: 19906713; PMC: PMC2808938\
\ immu 1 color 0,90,100\ group immu\ longLabel Immune Epitope Database and Analysis Resource (IEDB) Curated T-Cell Epitopes, MHC Class II\ mouseOverField mHCAlleleName\ priority 40\ shortLabel IEDB T Cell II\ track iedbTcellII\ type bigBed 12 .\ urls pubMedID=https://www.ncbi.nlm.nih.gov/pubmed/$$ bCellID=http://www.iedb.org/assayId/$$ referenceID=http://www.iedb.org/refId/$$ epitopeID=http://www.iedb.org/epId/$$ epitopeSourceMoleculeAccession=https://www.ncbi.nlm.nih.gov/protein/$$ antigenSourceMoleculeAccession=https://www.ncbi.nlm.nih.gov/protein/$$\ visibility hide\ iedbPred1 IEDB Pred Human I bigBed 12 . IEDB Epitope Predictions, Human T-Cell Class I 0 50 0 0 0 127 127 127 1 0 0\ This track shows peptides that are predicted to be displayed by \ human class I MHCs.\
\ \\ \ \ immu 1 altColor 0,0,0\ color 0,0,0\ compositeTrack on\ group immu\ longLabel IEDB Epitope Predictions, Human T-Cell Class I\ noScoreFilter on\ parent iedbPred dense\ priority 50\ scoreMax 1000\ scoreMin 850\ shortLabel IEDB Pred Human I\ spectrum on\ track iedbPred1\ type bigBed 12 .\ visibility hide\ iedbPredClassIMac IEDB Pred Macaque I bigBed 12 . IEDB Epitope Predictions, Macaque T-Cell Class I 0 60 20 20 0 137 137 127 1 0 0
\ This track shows peptides that are predicted to be displayed by \ macaque class I MHCs.\
\ \\ \ \ immu 1 color 20,20,0\ compositeTrack on\ group immu\ longLabel IEDB Epitope Predictions, Macaque T-Cell Class I\ noScoreFilter on\ parent iedbPred dense\ priority 60\ scoreMax 1000\ scoreMin 0\ shortLabel IEDB Pred Macaque I\ spectrum on\ track iedbPredClassIMac\ type bigBed 12 .\ visibility hide\ iedbSupp3 IEDB Pred II bigBed 12 . IEDB Epitope Predictions, Human T-Cell Class II 0 70 20 20 0 137 137 127 1 0 0
\ This track shows peptides that are predicted to be displayed by \ human class II MHCs.
\ \\ This track shows the sequences used in the Jun. 2014 ebola virus genome assembly.\
\\
Genome assembly procedures are covered in the NCBI\
assembly documentation.
\
NCBI also provides\
specific information about this assembly.\
\ The definition of this assembly is from the\ AGP file delivered with the sequence. The NCBI document\ AGP Specification describes the format of the AGP file.\
\\ In dense mode, this track depicts the contigs that make up the \ currently viewed scaffold. \ Contig boundaries are distinguished by the use of alternating gold and brown \ coloration. Where gaps\ exist between contigs, spaces are shown between the gold and brown\ blocks. The relative order and orientation of the contigs\ within a scaffold is always known; therefore, a line is drawn in the graphical\ display to bridge the blocks.
\\ Component types found in this track (with counts of that type in parentheses):\
\ This track shows the gaps in the Jun. 2014 ebola virus genome assembly.\
\\
Genome assembly procedures are covered in the NCBI\
assembly documentation.
\
NCBI also provides\
specific information about this assembly.\
\ The definition of the gaps in this assembly is from the\ AGP file delivered with the sequence. The NCBI document\ AGP Specification describes the format of the AGP file.\
\\ Gaps are represented as black boxes in this track.\ If the relative order and orientation of the contigs on either side\ of the gap is supported by read pair data, \ it is a bridged gap and a white line is drawn \ through the black box representing the gap. \
\ \ map 1 group map\ html gap\ longLabel Gap Locations\ shortLabel Gap\ track gap\ type bed 3 +\ visibility hide\ gc5BaseBw GC Percent bigWig 0 100 GC Percent in 5-Base Windows 0 100 0 0 0 128 128 128 0 0 0\ The GC percent track shows the percentage of G (guanine) and C (cytosine) bases\ in 5-base windows. High GC content is typically associated with\ gene-rich areas.\
\\ This track may be configured in a variety of ways to highlight different\ apsects of the displayed information. Click the\ "Graph configuration help"\ link for an explanation of the configuration options.\ \
The data and presentation of this graph were prepared by\ Hiram Clawson.\
\ \ map 0 altColor 128,128,128\ autoScale Off\ color 0,0,0\ graphTypeDefault Bar\ gridDefault OFF\ group map\ html gc5Base\ longLabel GC Percent in 5-Base Windows\ maxHeightPixels 128:36:16\ shortLabel GC Percent\ track gc5BaseBw\ type bigWig 0 100\ viewLimits 30:70\ visibility hide\ windowingFunction Mean\ iedbPred IEDB Prediction bigBed 12 . Immune Epitope Database and Analysis Resource (IEDB) HLA binding predictions 0 100 0 0 0 127 127 127 0 0 0\ The subtracks of this track shows peptides that are predicted to be displayed by \ human and macaque class I MHCs and human class II MHCs. Class I predictions\ are split by HLA allele, class II predictions are summarized into a single track.\
\ \\ \ Class II T cell epitope prediction: For human epitope predictions, 15-mer\ peptides overlapping by 10 aa residues were generated from aligned sequences,\ to avoid redundant peptides that share the same 9-mer binding core.\ For each 15-mer peptide, the binding affinity was predicted (expressed\ as the IEDB consensus percentile rank) for seven class II human HLA alleles\ (HLA-DRB1*03:01, HLA-DRB1*07:01, HLA-DRB1*15:01, HLA-DRB3*01:01,\ HLA-DRB3*02:02, HLADRB4*01:01 and HLA-DRB5*01:01), using the IEDB MHC class\ I binding prediction tool ("IEDB recommended" method). Predicted binders were\ selected based on the median consensus percentile rank estimated from the\ consensus percentile ranks for the seven alleles. All peptides with median\ consensus percentile rank <= 20.0 were selected as predicted binders, the\ threshold being previously optimized for capturing 50% of class II human T cell\ responses.\ \ \ immu 1 group immu\ longLabel Immune Epitope Database and Analysis Resource (IEDB) HLA binding predictions\ noScoreFilter on\ shortLabel IEDB Prediction\ superTrack on\ track iedbPred\ type bigBed 12 .\ patSeq Lens Patents bigBed 12 + Lens PatSeq Patent Document Sequences 0 100 0 0 0 127 127 127 0 0 0
This track shows genome matches to biomedical sequences submitted with patent application\ documents to patent offices around the world. The sequences, their mappings, and selected\ patent information were graciously provided by PatSeq, a search tool part of The Lens,\ Cambia.
\ \This track contains more data than the NCBI Genbank Division "Patents", as the\ sequences were extracted from patents directly.
\ \The data is split into two subtracks: one for sequences that are only part of patents that\ have submitted more than 100 sequences ("bulk patents")\ and a second track for all other sequences ("non-bulk patents").
\ \A sequence can be\ part of many patent documents, with some being found in several thousand patents.\ This track shows only a single alignment for every sequence, colored based on\ its occurrence in the different patent documents and using a color schema similar to The Lens.\ \
\
Based on the first sequence match, the four different item colors follow this priority ranking in\
descending order:
\
the sequence is referenced in the claims of a granted patent | |
the sequence is disclosed in a granted patent | |
the sequence is referenced in the claims of a patent application | |
the sequence is disclosed in a patent application |
Sequences referenced in the claims section of a\ patent document define the scope of the invention and are important during\ litigation. Therefore, they are given priority in the color scheme. Patent\ grant documents form the basis of patent protection and are prioritized over\ applications.
\ \Hover over a feature with the mouse to see\ the total number of documents where the sequence has been referenced, how many\ of these documents are granted patents and how often the sequence has been\ referenced in the claims. A randomly selected document title is also shown in\ the mouseover.
\ \Clicking on a feature will bring up the details page, which contains information about\ the sequence and alignment of that feature.\ The link at the top of the page opens the PatSeq Analyzer with\ the chromosomal region covered by the feature that was clicked. The PatSeq Analyzer\ is a specialized genome browser that allows for the viewing and filtering of patent\ sequence matches in detail.\
\ \The next section of the details page is a list of up to ten patent documents that include this\ sequence, with the number of occurrences within each document in parentheses.\ This is followed by up to thirty links to patent documents. The patent documents listed in these\ sections are displayed in order of the number of sequence occurrences in the document. Shown below\ these are the links to the sequence in The Lens, in the format\ "patentDocumentIdentifier-SEQIDNO (docSequenceCount)". The "SEQ ID NO"\ is an integer number, the unique identifier of a patent sequence in a patent\ document. When a protein sequence has been annotated on a nucleotide sequence,\ the "SEQ ID NO" contains the reading frame separated by a ".", e.g.\ "1.1" would indicate the first frame of SEQIDNO 1.\ The total number of sequences submitted with the patent document ("docSequenceCount") is\ shown in parentheses after the SEQIDNO. The links to the sequence are separated into the\ categories "granted and in claims", "granted", "in claims"\ and "applications" (=all others). Sequence identifiers link to the respective pages on PatSeq. A maximum of thirty documents\ are linked from this page per category listed in order of the number of sequence occurrences;\ please use PatSeq Analyzer to view all matching documents.\
\ \The score of the features in this track is the number of documents where the\ sequence appears in the claims. For example, by setting the score filter to 1, only\ sequences are shown that have been referenced at least once in the claims.\
\ \\ More than 96 million patent document files were collected by The Lens. The\ ST.25-formatted\ sequences were extracted and mapped to genomes with the aligners BLAT and BWA. The minimal\ identity of the query over the alignment is 95%. Note that for hg19, no patents are shown\ on chrM, as the mitochondrial chromosome used for the mapping was the one from\ the Ensembl genome FASTA files. \
\ \\ Thanks to the team behind The Lens,\ in particular,\ Osmat Jefferson\ and Deniz Koellhofer, for making these data available.\
\ \\ Send suggestions on the way data in this track is visualized to our support\ address\ \ genome@soe.ucsc.edu.\ \ Questions on the data itself are best directed to \ support@cambia.org.\ \
\ \\ The raw data can be explored interactively with the Table Browser.\ For automated download and analysis, the genome annotation is stored in a bigBed file that\ can be downloaded from\ our download server.\ The files for this track are called patNonBulk.bb and patBulk.bb. Individual\ regions or the whole genome annotation can be obtained using our tool bigBedToBed\ which can be compiled from the source code or downloaded as a precompiled\ binary for your system. Instructions for downloading source code and binaries can be found\ here.\
\ \\
The command to obtain the data as a tab-separated table looks like this:
\
bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/eboVir3/bbi/patNonBulk.bb -chrom=chr5 -start=1000000 -end=2000000 output.tsv\ A full log of the commands that were used to build this annotation is available\ from our database\ build description. In this text file, search for "patNonBulk" to find the right section.\ \
\ \ Editorial: The patent bargain\ Nature. 2013 Dec 12;504(7479):187-188.\
\ \\ \ Patently transparent.\ Nat Biotechnol. 2006 May;24(5):474.\ PMID: 16680110\
\ \\ Jefferson OA, Köllhofer D, Ehrich TH, Jefferson RA.\ \ Transparency tools in gene patenting for informing policy and practice.\ Nat Biotechnol. 2013 Dec;31(12):1086-93.\ PMID: 24316644\
\ pub 1 compositeTrack on\ exonNumbers off\ group pub\ itemRgb on\ linkIdInName on\ longLabel Lens PatSeq Patent Document Sequences\ mouseOverField mouseOver\ sepFields claimGrantSeqIds\ shortLabel Lens Patents\ skipFields mouseOver,fprint\ track patSeq\ type bigBed 12 +\ urlLabel Open Lens PatSeq Analyzer with this chromosomal range\ urls intDocIds="http://www.lens.org/lens/patent/$$" claimGrantSeqIds="http://www.lens.org/lens/patent/$$" claimSeqIds="http://www.lens.org/lens/patent/$$" grantSeqIds="http://www.lens.org/lens/patent/$$" appSeqIds="http://www.lens.org/lens/patent/$$"\ visibility hide\ multiz160way Multiz Align wigMaf 0.0 1.0 Multiz Genome Alignments of 158 Ebola strains and 2 Marburg strains 3 100 0 10 100 0 90 10 0 0 0 compGeno 1 altColor 0,90,10\ color 0, 10, 100\ frames multiz160wayFrames\ group compGeno\ itemFirstCharCase noChange\ longLabel Multiz Genome Alignments of 158 Ebola strains and 2 Marburg strains\ mafDot on\ mafShowSnp on\ noInherit on\ parent cons160wayViewalign on\ priority 100\ sGroup_Bundibugyo_2007 KC545393v1 KC545394v1 KC545395v1 KC545396v1 FJ217161v1 NC_014373v1 FJ217162v1 NC_014372v1\ sGroup_DRC_2007 KC242800v1 HQ613402v1 HQ613403v1 KC242784v1 KC242785v1 KC242786v1 KC242787v1 KC242788v1 KC242789v1 KC242790v1 AY354458v1 JQ352763v1 KC242796v1 KC242799v1 KC242792v1 KC242793v1 KC242794v1 KC242795v1 KC242797v1 KC242798v1\ sGroup_Ebola_2014 KJ660346v2 KJ660347v2 KJ660348v2 KM034549v1 KM034550v1 KM034551v1 KM034552v1 KM034553v1 KM034554v1 KM034555v1 KM034556v1 KM034557v1 KM034558v1 KM034559v1 KM034560v1 KM034561v1 KM034563v1 KM233035v1 KM233036v1 KM233037v1 KM233038v1 KM233039v1 KM233040v1 KM233041v1 KM233042v1 KM233043v1 KM233044v1 KM233045v1 KM233046v1 KM233047v1 KM233048v1 KM233049v1 KM233050v1 KM233051v1 KM233052v1 KM233053v1 KM233054v1 KM233055v1 KM233056v1 KM233057v1 KM233058v1 KM233059v1 KM233060v1 KM233061v1 KM233062v1 KM233063v1 KM233064v1 KM233065v1 KM233066v1 KM233067v1 KM233068v1 KM233069v1 KM233070v1 KM233071v1 KM233072v1 KM233073v1 KM233074v1 KM233075v1 KM233076v1 KM233077v1 KM233078v1 KM233079v1 KM233080v1 KM233081v1 KM233082v1 KM233083v1 KM233084v1 KM233085v1 KM233086v1 KM233087v1 KM233088v1 KM233089v1 KM233090v1 KM233091v1 KM233092v1 KM233093v1 KM233094v1 KM233095v1 KM233096v1 KM233097v1 KM233098v1 KM233099v1 KM233100v1 KM233101v1 KM233102v1 KM233103v1 KM233104v1 KM233105v1 KM233106v1 KM233107v1 KM233108v1 KM233109v1 KM233110v1 KM233111v1 KM233112v1 KM233113v1 KM233114v1 KM233115v1 KM233116v1 KM233117v1 KM233118v1\ sGroup_Marburg_1987 NC_024781v1 NC_001608v3\ sGroup_Reston_1989-90 AF522874v1 JX477165v1 NC_004161v1 AY769362v1 FJ621583v1 FJ621584v1 AB050936v1 JX477166v1 FJ621585v1\ sGroup_Sudan_1976-9 KC545389v1 KC545390v1 KC545391v1 KC545392v1 KC589025v1 JN638998v1 FJ968794v1 KC242783v2 EU338380v1 AY729654v1 NC_006432v1\ sGroup_Zaire(DRC)_1976-7 EU224440v2 AF499101v1 AY142960v1 AF086833v2 AF272001v1 KC242791v1 KC242801v1 NC_002549v1\ shortLabel Multiz Align\ snpTable mafSnp160way\ speciesCodonDefault eboVir3\ speciesGroups Ebola_2014 DRC_2007 Zaire(DRC)_1976-7 Bundibugyo_2007 Reston_1989-90 Sudan_1976-9 Marburg_1987\ subGroups view=align\ track multiz160way\ type wigMaf 0.0 1.0\ strainName160way Multiz Align wigMaf 0.0 1.0 Multiz Genome Alignments of 158 Ebola strains and 2 Marburg strains 3 100 0 10 100 0 90 10 0 0 0 compGeno 1 altColor 0,90,10\ color 0, 10, 100\ frames strainName160wayFrames\ group compGeno\ itemFirstCharCase noChange\ longLabel Multiz Genome Alignments of 158 Ebola strains and 2 Marburg strains\ mafDot on\ mafShowSnp on\ noInherit on\ parent strainCons160wayViewalign on\ priority 100\ sGroup_Bundibugyo_2007 EboBund-112_2012 EboBund-120_2012 EboBund-122_2012 EboBund-14_2012 Bundibugyo_Uganda_2007 Bundibugyo_2007 Cote_dIvoire_CIEBOV_1994 Cote_dIvoire_1994\ sGroup_DRC_2007 Ilembe_2002 034-KS_2008 M-M_2007 Luebo9_2007 Luebo0_2007 Luebo1_2007 Luebo23_2007 Luebo43_2007 Luebo4_2007 Luebo5_2007 Zaire_1995 Kikwit_1995 13625Kikwit_1995 13709Kikwit_1995 Gabon_1994 1Eko_1996 2Nza_1996 1Mbie_Gabon_1996 1Oba_Gabon_1996 1Ikot_Gabon_1996\ sGroup_Ebola_2014 Guinea_Kissidougou-C15_2014 Guinea_Gueckedou-C07_2014 Guinea_Gueckedou-C05_2014 EM095B_2014 EM095_2014 EM096_2014 EM098_2014 G3670v1_2014 G3676v1_2014 G3676v2_2014 G3677v1_2014 G3677v2_2014 G3679v1_2014 G3680v1_2014 G3682v1_2014 G3683v1_2014 G3687v1_2014 EM104_2014 EM106_2014 EM110_2014 EM111_2014 EM112_2014 EM113_2014 EM115_2014 EM119_2014 EM120_2014 EM121_2014 EM124v1_2014 EM124v2_2014 EM124v3_2014 EM124v4_2014 G3707_2014 G3713v2_2014 G3713v3_2014 G3713v4_2014 G3724_2014 G3729_2014 G3734v1_2014 G3735v1_2014 G3735v2_2014 G3750v1_2014 G3750v2_2014 G3750v3_2014 G3752_2014 G3758_2014 G3764_2014 G3765v2_2014 G3769v1_2014 G3769v2_2014 G3769v3_2014 G3769v4_2014 G3770v1_2014 G3770v2_2014 G3771_2014 G3782_2014 G3786_2014 G3787_2014 G3788_2014 G3789v1_2014 G3795_2014 G3796_2014 G3798_2014 G3799_2014 G3800_2014 G3805v1_2014 G3805v2_2014 G3807_2014 G3808_2014 G3809_2014 G3810v1_2014 G3810v2_2014 G3814_2014 G3816_2014 G3817_2014 G3818_2014 G3819_2014 G3820_2014 G3821_2014 G3822_2014 G3823_2014 G3825v1_2014 G3825v2_2014 G3826_2014 G3827_2014 G3829_2014 G3831_2014 G3834_2014 G3838_2014 G3840_2014 G3841_2014 G3845_2014 G3846_2014 G3848_2014 G3850_2014 G3851_2014 G3856v1_2014 G3856v3_2014 G3857_2014 NM042v1_2014 NM042v2_2014 NM042v3_2014\ sGroup_Marburg_1987 Marburg_KitumCave_Kenya_1987 Marburg_MtElgon_Musoke_Kenya_1980\ sGroup_Reston_1989-90 Reston_PA_1990 Reston09-A_2009 Pennsylvania_1990 reconstructReston_2008 Reston08-A_2008 Reston08-C_2008 Reston_1996 Alice_TX_USA_MkCQ8167_1996 Reston08-E_2008\ sGroup_Sudan_1976-9 EboSud-602_2012 EboSud-603_2012 EboSud-609_2012 EboSud-682_2012 EboSud-639_2012 Nakisamata_2011 Boniface_1976 Maleo_1979 Yambio_2004 Gulu_Uganda_2000 Gulu_2000\ sGroup_Zaire(DRC)_1976-7 GuineaPig_Mayinga_2007 Mouse_Mayinga_2002 Mayinga_2002 AF086833v2_1976 Mayinga_1976 Bonduni_1977 deRoover_1976 NC_002549v1_1976\ shortLabel Multiz Align\ snpTable mafSnpStrainName160way\ speciesCodonDefault eboVir3\ speciesGroups Ebola_2014 DRC_2007 Zaire(DRC)_1976-7 Bundibugyo_2007 Reston_1989-90 Sudan_1976-9 Marburg_1987\ subGroups view=align\ track strainName160way\ type wigMaf 0.0 1.0\ muPIT muPIT protein map bed 4 muPIT - Mapping Genomic Positions on Protein Structures 0 100 0 0 0 127 127 127 0 0 0 http://mupit-ebola.icm.jhu.edu/ResidueFileUpload?residues=chr1%20${\ This track indicates the mapped locations viewable in the MuPIT\ interactive system. Use the URL link MuPIT protein structures\ to enter the viewing system.\
\\ MuPIT interactive is a tool that allows you to map a sequence variant from\ its position in the human genome onto a protein structure. Viewing a variant\ on a protein structure can be useful in interpreting its potential biological\ consequences. After you have done the\ mapping, you can play with the protein structure by turning it around,\ zooming in and out, and turning color-coded annotations about the protein\ on and off. \
\ \\ A data mapping pipeline was developed in the\ Karchin lab\ to map from any genomic location to position in PDB structure(s).\
\\ In collaboration with:\
\\ The web server and visualization scripting was developed by In Silico Solutions:\
\ \\ Funding:\
\ \\ Niknafs N, Kim D, Kim R, Diekhans M, Ryan M, Stenson PD, Cooper DN, Karchin R.\ \ MuPIT interactive: webserver for mapping variant positions to annotated, interactive 3D\ structures.\ Hum Genet. 2013 Nov;132(11):1235-43.\ PMID: 23793516; PMC: PMC3797853\
\ genes 1 group genes\ longLabel muPIT - Mapping Genomic Positions on Protein Structures\ shortLabel muPIT protein map\ track muPIT\ type bed 4\ url http://mupit-ebola.icm.jhu.edu/ResidueFileUpload?residues=chr1%20${\ urlLabel MuPIT protein structures\ visibility hide\ ncbiGenePfam Pfam in NCBI Gene bed 12 Pfam Domains in NCBI Genes 0 100 20 0 250 137 127 252 0 0 0 http://pfam.sanger.ac.uk/family/$$\ Most proteins are composed of one or more conserved functional regions called\ domains. This track shows the high-quality, manually curated\ Pfam-A\ domains found in transcripts from the NCBI Genes track.\
\ \\ This track follows the display conventions for\ gene tracks.\
\ \\ The amino acid sequences from the NCBI Genes \ are submitted to the set of Pfam-A HMMs, which annotate regions within the\ predicted peptide that are recognizable as Pfam protein domains. These regions\ are then mapped to the transcripts themselves using the\ pslMap utility.\
\ \\ pslMap was written by Mark Diekhans at UCSC.\
\ \\ Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G,\ Forslund K et al.\ \ The Pfam protein families database.\ Nucleic Acids Res. 2010 Jan;38(Database issue):D211-22.\ PMID: 19920124; PMC: PMC2808889\
\ genes 1 color 20,0,250\ group genes\ longLabel Pfam Domains in NCBI Genes\ shortLabel Pfam in NCBI Gene\ track ncbiGenePfam\ type bed 12\ url http://pfam.sanger.ac.uk/family/$$\ spUniprot UniProt bigBed 12 + UniProt/SwissProt Annotations 1 100 0 0 0 127 127 127 0 0 0\ This track shows protein sequence annotations from the UniProt/SwissProt database,\ mapped to genomic coordinates.\ The data has been curated from scientific publications by the UniProt staff.\ The annotations are divided into two subtracks, one\ for all secondary structure annotations and another one for all other\ annotations.
\\ For the mutations curated by UniProt/SwissProt, please open the track\ "UniProt Variants" in the track group "Phenotype and Literature".\
\ \\ Genomic locations of UniProt/SwissProt annotations are labeled with a short name for\ the type of annotation (e.g. "glyco", "disul", "signal pep" etc.).\ A click on them shows the full annotation and provides a link to the UniProt/SwissProt\ record for more details.
\ \\ Mouse over a feature to see the full UniProt annotation comment.\
\ \\ Modified residues are highlighted in light blue, transmembrane regions in blue,\ glycosylation sites in yellow, disulfide bonds in grey, topological domains in\ red.\
\ \Note that for the human hg38 assembly, there also is a \ public track hub prepared by UniProt itself, with \ genome annotations produced and maintained by UniProt using their mapping\ method.
\ \\ UniProt sequences were aligned to UCSC/Gencode transcript sequences first with\ BLAT, then lifted to genome positions with pslMap. UniProt variants were\ obtained from the UniProt XML file. The variants were then mapped to the genome\ through the alignment using the pslMap program. This mapping approach\ draws heavily on the LS-SNP pipeline by Mark Diekhans. The complete script is\ part of the kent source tree and is located in src/utils/uniprotLift. The exact commands\ that were used to build this track can be found on github.\
\ \\
The raw data can be explored interactively with the\
Table Browser, or the\
Data Integrator.\
For automated analysis, the genome annotation is stored in a bigBed file that \
can be downloaded from the\
download server.\
The files for this track are called spAnnot.bb and spStruct.bb. Individual\
regions or the whole genome annotation can be obtained using our tool bigBedToBed\
which can be compiled from the source code or downloaded as a precompiled\
binary for your system. Instructions for downloading source code and binaries can be found\
here.\
The tool can also be used to obtain only features within a given range, for example:\
\
bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/eboVir3/bbi/spStruct.bb -start=0 -end=100000 stdout \
\
Please refer to our\
mailing list archives\
for questions, or our\
Data Access FAQ\
for more information. \
\ This track was created by Maximilian Haeussler, with advice from Mark Diekhans and Brian Raney.\ Thanks to UniProt for making all data available for download.\
\ \\ UniProt Consortium.\ \ Reorganizing the protein space at the Universal Protein Resource (UniProt).\ Nucleic Acids Res. 2012 Jan;40(Database issue):D71-5.\ PMID: 22102590; PMC: PMC3245120\
\ \\ Yip YL, Scheib H, Diemand AV, Gattiker A, Famiglietti LM, Gasteiger E, Bairoch A.\ \ The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure\ information on human protein variants.\ Hum Mutat. 2004 May;23(5):464-70.\ PMID: 15108278\
\ genes 1 compositeTrack on\ exonNumbers off\ group genes\ itemRgb on\ longLabel UniProt/SwissProt Annotations\ shortLabel UniProt\ track spUniprot\ type bigBed 12 +\ visibility dense\ uniprot UniProt bigBed 12 + UniProt SwissProt/TrEMBL Protein Annotations 0 100 0 0 0 127 127 127 0 0 0\ This track shows protein sequences and annotations on them from the UniProt/SwissProt database,\ mapped to genomic coordinates. \
\\ UniProt/SwissProt data has been curated from scientific publications by the UniProt staff,\ UniProt/TrEMBL data has been predicted by various computational algorithms.\ The annotations are divided into multiple subtracks, based on their "feature type" in UniProt.\ The first two subtracks below - one for SwissProt, one for TrEMBL - show the\ alignments of protein sequences to the genome, all other tracks below are the protein annotations\ mapped through these alignments to the genome.\
\ \Track Name | \Description | \
---|---|
UCSC Alignment, SwissProt = curated protein sequences | \Protein sequences from SwissProt mapped to the genome. All other\ tracks are (start,end) SwissProt annotations on these sequences mapped\ through this alignment. Even protein sequences without a single curated \ annotation (splice isoforms) are visible in this track. Each UniProt protein \ has one main isoform, which is colored in dark. Alternative isoforms are \ sequences that do not have annotations on them and are colored in light-blue. \ They can be hidden with the TrEMBL/Isoform filter (see below). |
UCSC Alignment, TrEMBL = predicted protein sequences | \Protein sequences from TrEMBL mapped to the genome. All other tracks\ below are (start,end) TrEMBL annotations mapped to the genome using\ this track. This track is hidden by default. To show it, click its\ checkbox on the track configuration page. |
UniProt Signal Peptides | \Regions found in proteins destined to be secreted, generally cleaved from mature protein. | \
UniProt Extracellular Domains | \Protein domains with the comment "Extracellular". | \
UniProt Transmembrane Domains | \Protein domains of the type "Transmembrane". | \
UniProt Cytoplasmic Domains | \Protein domains with the comment "Cytoplasmic". | \
UniProt Polypeptide Chains | \Polypeptide chain in mature protein after post-processing. | \
UniProt Regions of Interest | \Regions that have been experimentally defined, such as the role of a region in mediating protein-protein interactions or some other biological process. | \
UniProt Domains | \Protein domains, zinc finger regions and topological domains. | \
UniProt Disulfide Bonds | \Disulfide bonds. | \
UniProt Amino Acid Modifications | \Glycosylation sites, modified residues and lipid moiety-binding regions. | \
UniProt Amino Acid Mutations | \Mutagenesis sites and sequence variants. | \
UniProt Protein Primary/Secondary Structure Annotations | \Beta strands, helices, coiled-coil regions and turns. | \
UniProt Sequence Conflicts | \Differences between Genbank sequences and the UniProt sequence. | \
UniProt Repeats | \Regions of repeated sequence motifs or repeated domains. | \
UniProt Other Annotations | \All other annotations, e.g. compositional bias | \
\ For consistency and convenience for users of mutation-related tracks,\ the subtrack "UniProt/SwissProt Variants" is a copy of the track\ "UniProt Variants" in the track group "Phenotype and Literature", or \ "Variation and Repeats", depending on the assembly.\
\ \\ Genomic locations of UniProt/SwissProt annotations are labeled with a short name for\ the type of annotation (e.g. "glyco", "disulf bond", "Signal peptide"\ etc.). A click on them shows the full annotation and provides a link to the UniProt/SwissProt\ record for more details. TrEMBL annotations are always shown in \ light blue, except in the Signal Peptides,\ Extracellular Domains, Transmembrane Domains, and Cytoplamsic domains subtracks.
\ \\ Mouse over a feature to see the full UniProt annotation comment. For variants, the mouse over will\ show the full name of the UniProt disease acronym.\
\ \\ The subtracks for domains related to subcellular location are sorted from outside to inside of \ the cell: Signal peptide, \ extracellular, \ transmembrane, and cytoplasmic.\
\ \\ In the "UniProt Modifications" track, lipoification sites are highlighted in \ dark blue, glycosylation sites in \ dark green, and phosphorylation in \ light green.
\ \\ Duplicate annotations are removed as far as possible: if a TrEMBL annotation\ has the same genome position and same feature type, comment, disease and\ mutated amino acids as a SwissProt annotation, it is not shown again. Two\ annotations mapped through different protein sequence alignments but with the same genome\ coordinates are only shown once.
\ \On the configuration page of this track, you can choose to hide any TrEMBL annotations.\ This filter will also hide the UniProt alternative isoform protein sequences because\ both types of information are less relevant to most users. Please contact us if you\ want more detailed filtering features.
\ \Note that for the human hg38 assembly and SwissProt annotations, there\ also is a public\ track hub prepared by UniProt itself, with \ genome annotations maintained by UniProt using their own mapping\ method based on those Gencode/Ensembl gene models that are annotated in UniProt\ for a given protein. For proteins that differ from the genome, UniProt's mapping method\ will, in most cases, map a protein and its annotations to an unexpected location\ (see below for details on UCSC's mapping method).
\ \\ Briefly, UniProt protein sequences were aligned to the transcripts associated\ with the protein, the top-scoring alignments were retained, and the result was\ projected to the genome through a transcript-to-genome alignment.\ Depending on the genome, the transcript-genome alignments was either\ provided by the source database (NBCI RefSeq), created at UCSC (UCSC RefSeq) or\ derived from the transcripts (Ensembl/Augustus). The transcript set is NCBI\ RefSeq for hg38, UCSC RefSeq for hg19 (due to alt/fix haplotype misplacements \ in the NCBI RefSeq set on hg19). For other genomes, RefSeq, Ensembl and Augustus \ are tried, in this order. The resulting protein-genome alignments of this process \ are available in the file formats for liftOver or pslMap from our data archive\ (see "Data Access" section below).\
\ \An important step of the mapping process protein -> transcript ->\ genome is filtering the alignment from protein to transcript. Due to\ differences between the UniProt proteins and the transcripts (proteins were\ made many years before the transcripts were made, and human genomes have\ variants), the transcript with the highest BLAST score when aligning the\ protein to all transcripts is not always the correct transcript for a protein\ sequence. Therefore, the protein sequence is aligned to only a very short list\ of one or sometimes more transcripts, selected by a three-step procedure:\
\ For strategy 2 and 3, many of the transcripts found do not differ in coding\ sequence, so the resulting alignments on the genome will be identical.\ Therefore, any identical alignments are removed in a final filtering step. The\ details page of these alignments will contain a list of all transcripts that\ result in the same protein-genome alignment. On hg38, only a handful of edge\ cases (pseudogenes, very recently added proteins) remain in 2023 where strategy\ 3 has to be used.
\ \In other words, when an NCBI or UCSC RefSeq track is used for the mapping and to align a\ protein sequence to the correct transcript, we use a three stage process:\
This system was designed to resolve the problem of incorrect mappings of\ proteins, mostly on hg38, due to differences between the SwissProt\ sequences and the genome reference sequence, which has changed since the\ proteins were defined. The problem is most pronounced for gene families\ composed of either very repetitive or very similar proteins. To make sure that\ the alignments always go to the best chromosome location, all _alt and _fix\ reference patch sequences are ignored for the alignment, so the patches are\ entirely free of UniProt annotations. Please contact us if you have feedback on\ this process or example edge cases. We are not aware of a way to evaluate the\ results completely and in an automated manner.
\\ Proteins were aligned to transcripts with TBLASTN, converted to PSL, filtered\ with pslReps (93% query coverage, keep alignments within top 1% score), lifted to genome\ positions with pslMap and filtered again with pslReps. UniProt annotations were\ obtained from the UniProt XML file. The UniProt annotations were then mapped to the\ genome through the alignment described above using the pslMap program. This approach\ draws heavily on the LS-SNP pipeline by Mark Diekhans.\ Like all Genome Browser source code, the main script used to build this track\ can be found on Github.\
\ \\ This track is automatically updated on an ongoing basis, every 2-3 months.\ The current version name is always shown on the track details page, it includes the\ release of UniProt, the version of the transcript set and a unique MD5 that is\ based on the protein sequences, the transcript sequences, the mapping file\ between both and the transcript-genome alignment. The exact transcript\ that was used for the alignment is shown when clicking a protein alignment\ in one of the two alignment tracks.\
\ \\ For reproducibility of older analysis results and for manual inspection, previous versions of this track\ are available for browsing in the form of the UCSC UniProt Archive Track Hub (click this link to connect the hub now). The underlying data of\ all releases of this track (past and current) can be obtained from our downloads server, including the UniProt\ protein-to-genome alignment.
\ \\ The raw data of the current track can be explored interactively with the\ Table Browser, or the\ Data Integrator.\ For automated analysis, the genome annotation is stored in a bigBed file that \ can be downloaded from the\ download server.\ The exact filenames can be found in the \ track configuration file. \ Annotations can be converted to ASCII text by our tool bigBedToBed\ which can be compiled from the source code or downloaded as a precompiled\ binary for your system. Instructions for downloading source code and binaries can be found\ here.\ The tool can also be used to obtain only features within a given range, for example:\
\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/eboVir3/uniprot/unipStruct.bb -chrom=chr6 -start=0 -end=1000000 stdout \
\ Please refer to our\ mailing list archives\ for questions, or our\ Data Access FAQ\ for more information. \ \ \\ \
To facilitate mapping protein coordinates to the genome, we provide the\ alignment files in formats that are suitable for our command line tools. Our\ command line programs liftOver or pslMap can be used to map\ coordinates on protein sequences to genome coordinates. The filenames are\ unipToGenome.over.chain.gz (liftOver) and unipToGenomeLift.psl.gz (pslMap).
\ \Example commands:\
\ wget -q https://hgdownload.soe.ucsc.edu/goldenPath/archive/hg38/uniprot/2022_03/unipToGenome.over.chain.gz\ wget -q https://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/liftOver\ chmod a+x liftOver\ echo 'Q99697 1 10 annotationOnProtein' > prot.bed\ liftOver prot.bed unipToGenome.over.chain.gz genome.bed\ cat genome.bed\\ \ \
\ This track was created by Maximilian Haeussler at UCSC, with a lot of input from Chris\ Lee, Mark Diekhans and Brian Raney, feedback from the UniProt staff, Alejo\ Mujica, Regeneron Pharmaceuticals and Pia Riestra, GeneDx. Thanks to UniProt for making all data\ available for download.\
\ \\ UniProt Consortium.\ \ Reorganizing the protein space at the Universal Protein Resource (UniProt).\ Nucleic Acids Res. 2012 Jan;40(Database issue):D71-5.\ PMID: 22102590; PMC: PMC3245120\
\ \\ Yip YL, Scheib H, Diemand AV, Gattiker A, Famiglietti LM, Gasteiger E, Bairoch A.\ \ The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure\ information on human protein variants.\ Hum Mutat. 2004 May;23(5):464-70.\ PMID: 15108278\
\ genes 1 allButtonPair on\ compositeTrack on\ dataVersion /gbdb/$D/uniprot/version.txt\ exonNumbers off\ group genes\ hideEmptySubtracks on\ itemRgb on\ longLabel UniProt SwissProt/TrEMBL Protein Annotations\ mouseOverField comments\ shortLabel UniProt\ track uniprot\ type bigBed 12 +\ urls uniProtId="http://www.uniprot.org/uniprot/$$#section_features" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$"\ visibility hide\ spMut UniProt Mutations bigBed 12 + UniProt/SwissProt Amino Acid Substitutions 0 100 0 0 0 127 127 127 0 0 0 http://mupit-ebola.icm.jhu.edu/ResidueFileUpload?residues=chr1%20${\ This track shows the genomic positions of natural and artifical amino acid variants\ in the UniProt/SwissProt database.\ The data has been curated from scientific publications by the UniProt staff.\
\ \\ Genomic locations of UniProt/SwissProt variants are labeled with the amino acid\ change at a given position and, if known, the abbreviated disease name. A\ "?" is used if there is no disease annotated at this location, but the\ protein is described as being linked to only a single disease in UniProt.\
\ \\ Mouse over a mutation to see the UniProt comments.\
\ \\ Artificially introduced mutations are colored green and naturally occurring variants are colored\ red. For full information about a particular variant, click the "UniProt variant" linkout. \ The "UniProt record" linkout lists all variants of a particular protein sequence.\ The "Source articles" linkout lists the articles in PubMed that originally described\ the variant(s) and were used as evidence by the UniProt curators.\
\ \\ UniProt sequences were aligned to RefSeq sequences first with BLAT, then lifted\ to genome positions with pslMap. UniProt variants were parsed from the UniProt\ XML file. The variants were then mapped to the genome through the alignment\ using the pslMap program. This mapping approach\ draws heavily on the LS-SNP pipeline by Mark Diekhans. The complete script is\ part of the kent source tree and is located in src/hg/utils/uniprotMutations. \
\ \\ This track was created by Maximilian Haeussler, with advice from Mark Diekhans and Brian Raney.\
\ \\ UniProt Consortium.\ Activities at the Universal Protein Resource (UniProt). \ Nucleic Acids Res. 2014 Jan;42(Database issue):D191-8. \ PMID: 24253303; PMC: PMC3965022\
\ \\ Yip YL, Scheib H, Diemand AV, Gattiker A, Famiglietti LM, Gasteiger E, Bairoch A.\ \ The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure\ information on human protein variants.\ Hum Mutat. 2004 May;23(5):464-70.\ PMID: 15108278\
\ varRep 1 exonNumbers off\ group varRep\ html spMut\ itemRgb on\ longLabel UniProt/SwissProt Amino Acid Substitutions\ maxWindowCoverage 10000000\ mouseOverField comments\ noScoreFilter on\ shortLabel UniProt Mutations\ track spMut\ type bigBed 12 +\ url http://mupit-ebola.icm.jhu.edu/ResidueFileUpload?residues=chr1%20${\ urlLabel MuPIT protein structures\ urls variationId="http://www.uniprot.org/uniprot/$$" uniProtId="http://www.uniprot.org/uniprot/$$" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$"\ visibility hide\