UCSC Genome Browser assembly ID: hg38
Sequencing/Assembly provider ID: Genome Reference Consortium Human GRCh38.p14 (GCA_000001405.29)
Assembly date: Dec. 2013 initial release; June 2022 patch release 14
Assembly accession: GCA_000001405.29
NCBI Genome ID: 51 (Homo sapiens (human))
NCBI Assembly ID:
GCF_000001405.40 (GRCh38.p14, GCA_000001405.29)
BioProject ID: PRJNA31257
Search the assembly:
-
By position or search term: Use the "position or search term"
box to find areas of the genome associated with many different attributes, such
as a specific chromosomal coordinate range; mRNA, EST, or STS marker names; or
keywords from the GenBank description of an mRNA.
More information, including sample
queries.
-
By gene name: Type a gene name into the "search term" box,
choose your gene from the drop-down list, then press "submit" to go
directly to the assembly location associated with that gene.
More information.
-
By track type: Click the "track search" button
to find Genome Browser tracks that match specific selection criteria.
More information.
Download sequence and annotation data:
Assembly Details
The GRCh38 assembly is the first major revision of the human genome released in more than four
years. As with the previous GRCh37 assembly, the
Genome Reference Consortium (GRC)
is now the primary source for human genome assembly data submitted to GenBank. Beginning with this
release, the UCSC Genome Browser version numbers for the human assemblies now match those of the
GRC to minimize version confusion. Hence, the GRCh38 assembly is referred to as "hg38"
in the Genome Browser datasets and documentation. For a glossary of assembly-related terms, see the
GRC Assembly Terminology page.
GRCh38 Highlights
-
Alternate sequences - Several human chromosomal regions exhibit sufficient variability to
prevent adequate representation by a single sequence. To address this, the GRCh38 assembly provides
alternate sequence for selected variant regions through the inclusion of alternate loci
scaffolds (or alt loci). Alt loci are separate accessioned sequences that are aligned
to reference chromosomes. The GRCh38 initial assembly contained 261 alt loci, many of which are
associated with the LRC/KIR area of chr19 and the MHC region on chr6. Subsequent GRC patch releases
have added additional alt loci and fix patches. See the
sequences page for the latest
list of the reference chromosomes, alternate, and patch sequences in GRCh38.
-
Fix sequences - Fix patches denoted by chr__fix represent changes to the
existing sequence. These are generally error corrections (such as base changes, component
replacements/updates, switch point updates or tiling path changes) or assembly improvements
(such as extension of sequence into gaps). These fix patch scaffold sequences are given chromosome
context through alignments to the corresponding chromosome regions. A list of all chromosomes
including chr_fix sequences can be found in the
sequences page.
-
Centromere representation - Debuting in this release, the large megabase-sized gaps that
represented centromeric regions in previous assemblies have been replaced by sequences from
centromere models created by
Karen Miga et al., using centromere databases developed during her work in
the Willard lab at Duke University (now at the University of Chicago) and analysis software developed while working in the
Kent lab at UCSC.
The models, which provide the approximate repeat number and order for each centromere, will be
useful for read mapping and variation studies.
-
Mitochondrial genome - The mitochondrial reference sequence included in the GRCh38 assembly
(termed "chrM" in the UCSC Genome Browser) is the
Revised Cambridge Reference Sequence (rCRS) from
MITOMAP with GenBank accession number
J01415.2 and RefSeq accession number NC_012920.1. This differs from the chrM sequence
(RefSeq accession number NC_001807) provided by the Genome Browser for hg19, which was not updated
when the GRCh37 assembly later transitioned to the new version.
-
Sequence updates - Several erroneous bases and misassembled regions in GRCh37 have been
corrected in the GRCh38 assembly, and more than 100 gaps have been filled or reduced. Much of the
data used to improve the reference sequence was obtained from other genome sequencing and analysis
projects, such as the 1000 Genomes
Project.
-
Analysis set - The GRCh38 assembly offers an "analysis set" that was created to
accommodate next generation sequencing read alignment pipelines. To avoid false mapping of reads,
duplicate copies of centromeric arrays and WGS on several chromosomes have been hard-masked with
Ns. The two PAR regions on chromosome Y have also been hard-masked, and the Epstein-Barr virus
sequence has been added as a decoy to attract contamination in samples. Two versions of the
analysis set are available on our
downloads page:
one without the alternate chromosomes from this assembly, and one that includes them.
Chromosome naming conventions
UCSC has introduced some slight changes to the Genome Browser chromosome naming scheme with
this release:
- Haplotype chromosome, unplaced contig and unlocalized contig names now include
their NCBI accession number (e.g., chr6_GL000256v2_alt)
- The "v2" at the end of the accession number indicates the NCBI version number
- Haplotype chromosome names consist of the chromosome number, followed by the NCBI accession
number, followed by "alt"
- Fix sequence names consist of the chromosomes number, followed by the NCBI accession number,
followed by "fix" (e.g., chr7_KQ031388v1_fix)
- Unlocalized contig names consist of the chromosome number, followed by the NCBI accession
number, followed by "random"
- Unplaced contig names (contigs whose associated chromosome is not known) consist of
"chrUn" followed by the NCBI accession number
Pseudoautosomal regions
The Y chromosome in this assembly contains two pseudoautosomal regions (PARs)
that were taken from the corresponding regions in the X chromosome and are
exact duplicates:
chrY:10,000-2,781,479 and chrY:56,887,902-57,217,415
chrX:10,000-2,781,479 and chrX:155,701,382-156,030,895
Assembly statistics
For a detailed set of statistics about this assembly, see the
GRCh38 GenBank record.
Number | Haplotypes | Fixes | Unlocalized Contigs |
chr1 | 22 | 12 | 9 |
chr2 | 22 | 10 | 2 |
chr3 | 22 | 10 | 1 |
chr4 | 17 | 9 | 1 |
chr5 | 17 | 6 | 1 |
chr6 | 18 | 6 | 0 |
chr7 | 12 | 5 | 0 |
chr8 | 17 | 9 | 0 |
chr9 | 7 | 5 | 4 |
chr10 | 5 | 8 | 0 |
chr11 | 16 | 15 | 1 |
chr12 | 16 | 9 | 0 |
chr13 | 8 | 8 | 0 |
chr14 | 6 | 3 | 8 |
chr15 | 11 | 6 | 1 |
chr16 | 10 | 5 | 1 |
chr17 | 22 | 9 | 3 |
chr18 | 12 | 3 | 0 |
chr19 | 59 | 6 | 0 |
chr20 | 4 | 2 | 0 |
chr21 | 7 | 4 | 0 |
chr22 | 14 | 5 | 9 |
chrX | 7 | 7 | 0 |
chrY | 0 | 4 | 1 |
chrM | 0 | 0 | 0 |
Type | Total |
chromosomes | 25 |
haplotypes | 351 |
fixes | 166 |
unlocalized contigs | 42 |
unplaced contigs | 127 |
| |
Total | 711 |
For more information about the files included in the GRCh38 GenBank submission, see the
GRCh38 README.
Bulk downloads of the sequence and annotation data may be obtained from the Genome Browser
Downloads page or the
FTP server.
The annotation tracks for this browser were generated by UCSC and collaborators worldwide.
References
Miga KH, Newton Y, Jain M, Altemose N, Willard HF, Kent WJ.
Centromere reference models for human chromosomes X and Y satellite arrays.
Genome Res. 2014 Apr;24(4):697-707. Epub 2014 Feb 5.