Question

454 Coverage Quality

2

Entering edit mode

14.2 years ago

Pierre Lindenbaum 164k

I've been given a set of 454 sequences (exome sequencing). The coverage was:

 Mean coverage                          18,93
 1x                                   90,60%
 5x                                   68,09%
 10x                                  49,68%
 20x                                  33,75%
 40x                                  16,28%
 80x                                   0,76%
 100x                                  0,07%

I'm not used to this kind of data , as far as I understand 10% of the bases haven't been covered...

If that was a set of Illumina GA data, I would say that this set was badly covered

But for 454, what should I think of that result ?

Update:

(Human Genome, two chromosomes have been sequenced)

             Count         Average Length Total-Bases
Reads        563.561       328,89         185.348.167
Matched      473.235       328,26         155.345.247
Not matched  90.326        332,16          30.002.920
References   2             127.542.357    255.084.714

coverage next-gen sequencing quality • 3.8k views

ADD COMMENT • link updated 10.4 years ago by Biostar 20 • written 14.2 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Hi Pierre, here are a few questions to help us assess your data: How many sequences did you get? What's their average length? What is the expected total exome length for your species? Cheers

ADD REPLY • link 14.2 years ago by Eric Normandeau 11k

0

Entering edit mode

Eric, thanks, I'll get this information tomorrow morning

ADD REPLY • link 14.2 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

It would also help to know more about origins and characteristics of the sample that was sequenced.

ADD REPLY • link 14.2 years ago by Istvan Albert 102k

score 1 · Answer 1 · 2010-10-16

From purely probabilistic point of view an 18x coverage should lead a lot fewer than 10% uncovered bases.

But since you are saying that this data comes from a more novel methodology(that of exon capture) there might be some larger inherent errors to the process. I recall some papers indicating a 96-98% recovery rate of exons in published results. I would expect less successful results in day to day trials. It might just be some exonic regions do not work well in actually capturing data.

score 1 · Answer 2 · 2010-10-16

Need more info: 454 Titanium or FLX and extraction technique and targets.

You say that you targeted two chromosomes, you meann the whole thing or the exomes? How was the extraction done?

For your mean depth 18, a really naive Poisson model suggests that the zero based coverage should be far less than 10% (probability Poisson is zero with lambda=18), but depends on your extraction technology really.

Ram · Answer 3 · 2010-10-18

1

Entering edit mode

14.2 years ago

Fiamh ▴ 220

As Istvan pointed out it depends heavily on the extraction approach. The figure I've seen mentioned most consistently is even lower at 80-90% (for example see Dan's recent report from the CSHL meeting).

ADD COMMENT • link updated 5.3 years ago by Ram 44k • written 14.2 years ago by Fiamh ▴ 220

score 0 · Answer 4 · 2010-10-25

There could also be a problem with how the coverage was measured. E.g., if you're using an aligner (like Blast) which masks low complexity, or if you require unambigous matches (in which case repeats might be missed).

Generally, read coverage isn't as evenly distributed as one would like, so stochastic models like Poisson only works as a rough approximation.