Are The Human Reference Genome Revisions (E.G. Grch37) Still Based On The Dna Of The Original Individuals?
7
10
Entering edit mode
14.2 years ago
Bio_X2Y ★ 4.4k

I understand that genetic material from a number of individuals was used to construct the public human reference genome. Since the GRC are releasing ongoing revisions (e.g. GRCh37), are they still looking at the DNA of the same individuals?

genome • 6.3k views
ADD COMMENT
0
Entering edit mode

An interesting question, but could you construct an illustration case where such information would be relevant?

ADD REPLY
0
Entering edit mode

Michael actually makes a pretty good point. Since the reference is a conglomeration anyway, it's pretty useless to consider it as a whole. It's clear that in the near future, we'll be assaying against panels of reference genomes, rather than some monolithic singular genome.

ADD REPLY
0
Entering edit mode

Thanks guys. I don't have a particular illustration case in mind, I was just generally curious. I was hoping to get a feel for whether the genome represents 5, 10, 100+ different individual contributions.

ADD REPLY
6
Entering edit mode
14.2 years ago

Don't forget how they get those sequences. They come from BAC clones. Each BAC come from one individual. First they assemble the BACs, they keep one for each region and in the overlapping region of two clones "the sequence of the first clone is used until the first switch point and then the sequence of the second clone is used." See Figure 1

It means that the genome is, overall, made by different people (a chunk from person A then person B) but each locus, each nucleotide, is about one person, not an average or a consensus. On the fusion bit there should be overlap, so fusion should not occur over CNV or unique sequences cut in half.

ADD COMMENT
3
Entering edit mode
14.2 years ago

You might start by checking out the site of the Genome Reference Consortium. There may be answers or links to publications that answer your questions there.

I don't have a citation for this, but from what I understand, most of the BAC libraries that were used to create the reference genome are still around in one form or another. I'd be surprised if they weren't using that same DNA.

ADD COMMENT
3
Entering edit mode
14.2 years ago
Michael 55k

Citing http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/index.shtml

GRCh37 is a haploid assembly,constructed from multiple individuals and can be divided into a 'primary assembly' and a set of 'alternate loci'. The primary assembly represents the assembled chromosomes, plus any unlocalized or unplaced sequence that represent the non-redundant, haploid assembly.The alternate loci represent regions for which there is large scale variation and an alternate tilng path is available for this region.

That makes no statement specifically about "who" these multiple individuals are, but make also no claims to that the set of individuals stays the same. To improve quality of the assemblies and make the reference more comprehensive, it would be beneficial to include more data as they become available. Thus, why would they limit themselves to the initial set of individuals? So I wouldn't rely on the set being the same, also why would that be relevant?

Just speculation.

ADD COMMENT
2
Entering edit mode
14.2 years ago
Neilfws 49k

My understanding, from the URL provided by Chris, is that the revisions refer to computational improvements in sequence assembly and gene/feature prediction. I would be surprised if DNA from new individuals were added to the process, since it would then cease to be a "reference".

However, it is difficult to know for sure: there is no mention of new samples but then, no mention that there are not new samples.

ADD COMMENT
2
Entering edit mode
14.2 years ago
Litali ▴ 50

I have an example when this question is interesting. We mapped using 454 to GRCh36 and found some structural variations which seem to be mutations. However, when i find those specific reads and blast them, there is hit to an 'alternative assembly'. So.. does it fit another individual and not those who were included in the GRCh36? does it mean it is not a real mutation? or that it is not correct to map against the GRCh36 in the first place?

ADD COMMENT
2
Entering edit mode
14.2 years ago
Suganthi ▴ 50

Stefano is correct. And I have also heard that the majority of BACs come from one individual in Buffalo. I think I did read it from a reliable source, though I don't remember exactly where now.

Litali: The alternative assembly is the Celera assembly of the initial human genome ( a major portion of which supposedly comes from Venter's DNA). And yes, strutural variations appear to be more common than we thought. For now, we are still using GRCh36 as the reference, but the notion of a single reference genome is becoming obsolete. You might want to check out Evan Eichler's papers.

Here is a paper very relevant to your question on structural variation http://www.nature.com/nbt/journal/v28/n1/abs/nbt.1596.html

Building the sequence map of the human pan-genome

ADD COMMENT
1
Entering edit mode
14.2 years ago
jvijai ★ 1.2k

Anecdotally, I was told that 60% or more BACs that they used for the assembly initially were from RPCI BAC set, and possibly someone from Buffalo, NY. This is not corroborated... could very well be an urban legend.

ADD COMMENT

Login before adding your answer.

Traffic: 2017 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6