Are There Multiple Human Genome Reference Sequence?
2
11
Entering edit mode
13.7 years ago
Blunders ★ 1.1k

Are there multiple human genome reference sequences, and if so, is there a standard means of hashing or differentiating them in a way which does not require whole sequence comparison?

For example, the reference sequences used by Ensembl and UCSC.

sequence genome human • 8.5k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
12
Entering edit mode
13.7 years ago
Bert Overduin ★ 3.7k

There is only one human reference genome sequence, but there are different versions of it, as people are still fiddling around with the assembly and fixing problems.

The current version is called GRCh37 (also referred to as hg19), older versions are NCBI36 (hg18), NCBI35 (hg17) etc. etc.

The major browsers (Ensembl, UCSC, NCBI MapViewer) are all using the GRCh37 assembly.

For more information you can have a look at the site of the Genome Reference Consortium.

ADD COMMENT
1
Entering edit mode

@Bert Overduin: Great, thanks!

ADD REPLY
7
Entering edit mode
13.7 years ago
Alex Stoddard ▴ 190

As Bert says there is one reference sequence GRCh37.

Unlike the previous assemblies NCBI36 etc. GRCh37 does include some alternative regions that attempt to cover some parts of the genome known to be highly variant between individuals (the MHC region for instance).

Note that the reference sequence is a mosaic haploid genome, meaning that it is assembled from mulitple individuals but only ever lists one individual's sequence at any given locus. This can have the unfortunate (but rare) effect of some regions of the reference having a haplotype which has never existed in any human population. I believe GRCh37 is attempting to patch those regions as they are identified.

There is also the very confusingly (and perhaps hubristically) named "huRef" genome assembly. This is also available from the NCBI and is the de novo assembled diploid genome of Craig Ventner; see the PLoS Biology paper .

ADD COMMENT
4
Entering edit mode

Two other things which may be worthwhile to mention. First, because it is derived from just a few individuals, for part of the sequence variants the reference sequence contains the minor allele. And second, in contrast to the huRef assembly, the GRCh37 assembly is haploid.

ADD REPLY
1
Entering edit mode

You said "perhaps hubristically" there is one word there that I don't understand. It is "perhaps" ;-)

ADD REPLY
1
Entering edit mode

+1. Nonetheless, the boundaries of haplotypes are very clear in assembly. I do not think we can "patch those regions". There is a third assembly: celera. If you go to dbSNP and some other databases, you may see three types of coordinates.

ADD REPLY
0
Entering edit mode

Saw this on twitter today (via edyong209 and @sciencegoddess): Venter was asked "What makes you think you can do a better job with life & genetics than God?" A: "We have computers" Just to illustrate why I doubt the "perhaps" ;-).

ADD REPLY

Login before adding your answer.

Traffic: 1815 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6