Question

Error In Reference Genome

1

Entering edit mode

12.7 years ago

Ashutosh Pandey 12k

Hi Everyone,

Does anyone has an idea about the error rate (base errors) present in the mouse reference genome assembly? I am not talking about gaps but the error in nucleotide assignment in the reference genome.

Thanks

error reference genome • 2.7k views

ADD COMMENT • link updated 12.7 years ago by deanna.church ★ 1.1k • written 12.7 years ago by Ashutosh Pandey 12k

score 1 · Answer 1 · 2012-08-10

Although I do not have a good answer, perhaps the following is helpful:

The updated mouse reference paper (2007, Plos Biology) might have more information and statistics on this.
One way to figure it out would be to sequence many other mice, and then to find which SNPs are unique to the reference. But that wouldn't be sufficient: one would need to distinguish between mutations private to the sample sequenced versus genuine errors.
Another way would be to find the original BAC/YAC for a few randomly picked segment, and redo the sequencing to see how closely it matches the original. This also assumes few / no replication errors in the clones.

score 1 · Answer 2 · 2012-08-10

I think that is a very difficult question to answer. The Mouse Genomes Project states that strains were on average sequenced 25 times and that they deposited their raw data at the Short Read Archive. If you really want to, you could look up the error rates for their Illumina GAII that they used, which will very with read length. Then, you might track down errors in the assembly too, and of course, there is always the possibility of having to deal with low-level contamination among your samples.

Have a look at this rather recent blog post, Hidden assembly problems exposed, where a similar question than yours is tackled for the human reference genome.

After all, this does not really answer your question, but it will be interesting to see if someone actually can produce a definite answer.

score 0 · Answer 3 · 2012-08-14

You can get information about the mouse reference assembly here: http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/mouse/index.shtml

The single base error rate for the mouse reference assembly is roughly 1/100,000 (similar to human), but there are other issues to be dealt with as well. An updated version of the reference assembly was recently produced (GRCm38), but there are still a few hundred issues under review: http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/mouse/issues/index.shtml#status