Error In Reference Genome
3
1
Entering edit mode
12.3 years ago

Hi Everyone,

Does anyone has an idea about the error rate (base errors) present in the mouse reference genome assembly? I am not talking about gaps but the error in nucleotide assignment in the reference genome.

Thanks

error reference genome • 2.6k views
ADD COMMENT
1
Entering edit mode
12.3 years ago
Deniz ▴ 140

Although I do not have a good answer, perhaps the following is helpful:

  • The updated mouse reference paper (2007, Plos Biology) might have more information and statistics on this.

  • One way to figure it out would be to sequence many other mice, and then to find which SNPs are unique to the reference. But that wouldn't be sufficient: one would need to distinguish between mutations private to the sample sequenced versus genuine errors.

  • Another way would be to find the original BAC/YAC for a few randomly picked segment, and redo the sequencing to see how closely it matches the original. This also assumes few / no replication errors in the clones.

ADD COMMENT
1
Entering edit mode
12.3 years ago
Joachim ★ 2.9k

I think that is a very difficult question to answer. The Mouse Genomes Project states that strains were on average sequenced 25 times and that they deposited their raw data at the Short Read Archive. If you really want to, you could look up the error rates for their Illumina GAII that they used, which will very with read length. Then, you might track down errors in the assembly too, and of course, there is always the possibility of having to deal with low-level contamination among your samples.

Have a look at this rather recent blog post, Hidden assembly problems exposed, where a similar question than yours is tackled for the human reference genome.

After all, this does not really answer your question, but it will be interesting to see if someone actually can produce a definite answer.

ADD COMMENT
0
Entering edit mode
12.3 years ago
deanna.church ★ 1.1k

You can get information about the mouse reference assembly here: http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/mouse/index.shtml

The single base error rate for the mouse reference assembly is roughly 1/100,000 (similar to human), but there are other issues to be dealt with as well. An updated version of the reference assembly was recently produced (GRCm38), but there are still a few hundred issues under review: http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/mouse/issues/index.shtml#status

ADD COMMENT

Login before adding your answer.

Traffic: 1808 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6