Question

How Many Human Genome Assemblies Are Avaliable?

10

Entering edit mode

13.9 years ago

Alex ★ 1.5k

How many human genomes assemblies are avaliable for analysis? On NCBI website I found three avaliable genomes assembled in chromosomes:

the reference assembly
the Celera assembly
and diploid Venters's genome.

Additionaly there are three WGS assembly that are not assembled in chromosomes:

Watson's genome
African genome
Asian genome

Are there any other avaliable assemblies that are not listed by NCBI?

human genome assembly next-gen sequencing • 3.9k views

ADD COMMENT • link updated 13.9 years ago by lh3 33k • written 13.9 years ago by Alex ★ 1.5k

score 6 · Answer 1 · 2010-12-27

in case you mean "browseable" assemblies yes, as far as I am concerned these are all the publicly available ones to date.

but if you want human genome assemblies for deeper analysis, doesn't the 1000 Genomes data suit your needs? you can even consider digging into the major NGS repositories such as the american SRA or the european ENA.

score 6 · Answer 2 · 2010-12-27

See the description of the track "Genome Variants" in the UCSC genome Browser:

This track displays variant base calls from the publicly released genome sequences of several individuals:

* 5 Sub-Saharan African genomes sequenced by Penn State University:
      o !Gubi (KB1),
      o G/aq'o (NB1),
      o !Ai (MD8),
      o D#kgao (TK1),
      o Archbishop Desmond Tutu (ABT), 
* 6 individuals from the 1000 Genome Project high-coverage pilot:
      o a CEU daughter and parents (NA12878, NA12891, NA12892)
      o a YRI daughter and parents (NA19240, NA19238, NA19239) 
* and independently published genomes:
      o Craig Venter,
      o James Watson,
      o Anonymous Yoruba individual NA18507,
      o Anonymous Han Chinese individual (YH, YanHuang Project),
      o Seong-Jim Kim (SJK),
      o Anonymous Korean individual (AK1),
      o Stephen Quake,
      o Anonymous Irish male,
      o Extinct Palaeo-Eskimo Saqqaq individual

score 4 · Answer 3 · 2010-12-28

I do not know how one would define "assembly". But in the sense of de novo assembly, 5 are publicly available:

The official human reference genome
Celera assembly
Venter
YanHuang
NA18507

In the sense of mapping assembly, there are very few. For all the sequencing projects in the public domain, you can always get the raw reads, sometimes the list of SNPs and occasionally the alignment, but these are not really mapping assembly. In my definition of mapping assembly, you have to know which regions are accessible and which are not, but this is rarely available.

I have processed some of the published data sets in a uniform way. For people who are interested, they are here.