These directories contain mouse/human alignments made
using the June 2002 human assembly (also known as build 30)
vs. the Feb. 2002 mouse assembly (also known as MGSCv3 or mm2).
The three directories are:
axtAll - contains all alignments
axtBest - contains alignments filtered so that only
the best alignment for any given region
of the human genome is left.
axtTight - contains only a relatively stringent
subset of the axtBest alignments.
All alignments are in 'axt' format. Each alignment
contains three lines and is separated from the next
alignment by a space:
Line 1 - summarizes the alignment.
Line 2 - contains the human sequence with inserts.
Line 3 - contains the mouse sequence with inserts.
The summary line contains 9 blank separated fields with the
following meanings:
1 - Alignment number. The first alignment in a file
is numbered 0, the next 1, and so forth.
2 - Human chromosome.
3 - Start in human chromosome. The first base is
numbered 1.
4 - End in human chromosome. The end base is included.
5 - Mouse chromosome.
6 - Start in mouse.
7 - End in mouse.
8 - Mouse strand. If this is '-' then the mouse start/
mouse end fields are relative to the reverse complemented
mouse chromosome.
9 - Blastz score. The scoring matrix blastz uses is:
A C G T
A 91 -114 -31 -123
C -114 100 -125 -31
G -31 -125 100 -114
T -123 -31 -114 91
with a gap open penalty of 400 and a gap extension
penalty of 30. The minimum score for an alignment
to be kept was 3000 for the first pass, and then
2200 for the second pass, which just restricts
the search space to the regions between two alignments
found in the first pass.
The alignments were done with blastz, which is available
from Webb Miller's group at PSU. Each chromosome
was divided into 10010000 base chunks with 10000 bases
of overlap. The axtAll alignments include this overlap.
The .lav format blastz output, which does not include
the sequence, was converted to .axt with PSU's lav2axt.
The axtBest alignments were processed with axtBest from
Jim Kent at UCSC. The axtTight alignments were processed
with subsetAxt from Jim Kent using the matrix:
A C G T
A 100 -200 -100 -200
C -200 100 -200 -100
G -100 -200 100 -200
T -200 -100 -200 100
with a gap open penalty of 2000 and a gap extension
penalty of 50. The minimum score was 3400. The axtTight
subset covers 6% of the human genome while axtBest covers
40%.
The mouse genome data has specific conditions for use. Please
refer to http://genome.ucsc.edu/goldenPath/credits.html for
details.
Name Last modified Size Description
Parent Directory -
axtbest/ 2002-10-27 12:07 -
axtTight/ 2002-10-27 12:45 -
axtBest/ 2002-10-27 12:07 -
axtAll/ 2002-10-27 12:12 -