Are there multiple human genome reference sequences, and if so, is there a standard means of hashing or differentiating them in a way which does not require whole sequence comparison?
For example, the reference sequences used by Ensembl and UCSC.
Are there multiple human genome reference sequences, and if so, is there a standard means of hashing or differentiating them in a way which does not require whole sequence comparison?
For example, the reference sequences used by Ensembl and UCSC.
There is only one human reference genome sequence, but there are different versions of it, as people are still fiddling around with the assembly and fixing problems.
The current version is called GRCh37 (also referred to as hg19), older versions are NCBI36 (hg18), NCBI35 (hg17) etc. etc.
The major browsers (Ensembl, UCSC, NCBI MapViewer) are all using the GRCh37 assembly.
For more information you can have a look at the site of the Genome Reference Consortium.
As Bert says there is one reference sequence GRCh37.
Unlike the previous assemblies NCBI36 etc. GRCh37 does include some alternative regions that attempt to cover some parts of the genome known to be highly variant between individuals (the MHC region for instance).
Note that the reference sequence is a mosaic haploid genome, meaning that it is assembled from mulitple individuals but only ever lists one individual's sequence at any given locus. This can have the unfortunate (but rare) effect of some regions of the reference having a haplotype which has never existed in any human population. I believe GRCh37 is attempting to patch those regions as they are identified.
There is also the very confusingly (and perhaps hubristically) named "huRef" genome assembly. This is also available from the NCBI and is the de novo assembled diploid genome of Craig Ventner; see the PLoS Biology paper .
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
related: Human Genome : Hg18/Build36 Vs Hg19/Build37