Question

Estimating Assembly Error Rate For Given Sequencing Depth

1

Entering edit mode

11.1 years ago

Leszek 4.2k

According to Illumina, error rate (calling wrong base in single read) for GAIIx is ~1%.
Could you help me to estimate, what would be the error rate (probability of calling wrong base) in the genome assembly at let's say 100X coverage? Is it simply 1%/100?

illumina • 4.1k views

ADD COMMENT • link updated 10.9 years ago by Biostar 20 • written 11.1 years ago by Leszek 4.2k

1

Entering edit mode

Wouldn't it be 1%^100? Of course, the error rate actually changes as a function of base-postion in the read length and then there are the Phred scores to think of, so I suspect the proper equation would be quite messy.

Edit: Err, 1%^100 would be the naive probability that all of the reads covering a base contain an error. Of course, you don't actually need all of them to contain an error and they wouldn't then all contain the same error. Mea culpa!

ADD REPLY • link 11.1 years ago by Devon Ryan 104k

score 2 · Answer 1 · 2013-10-09

2

Entering edit mode

11.1 years ago

Istvan Albert 101k

You can't evaluate assemblies this way because the errors in reads can cause mis-assemblies where the effect cannot be described with classical probabilities.

The theoretical problem of most reads ending up with the same error at the exact same position by sheer chance will be so small that is not worth accounting for.

This is not to say that this event does not happen, it is just that when it does it won't be due to random chance but a systematic problem in which case probabilistic estimation does not help.

ADD COMMENT • link 11.1 years ago by Istvan Albert 101k

1

Entering edit mode

That is right, there are weak relationships between minor errors in sequencing and errors in assembly -- even with hundreds of X coverage, a "simple" problem of repeated sequence could affect the quality of a resulted assembly significantly. Thus, I would also guess that the minor sequencing error (100X coverage and 1% error) is dismissible.

ADD REPLY • link 10.9 years ago by Pavel Senin ★ 1.9k