Question

Calculating Average Read Quality

2

Entering edit mode

13.1 years ago

hadasa ★ 1.0k

I would like to calculate the accuracy of 454 reads. I have taken an approach that involves summing the quality scores for each read and dividing by the length of the read. This will obscure regions that may have low or high quality scores.( at the moment It might not matter) Is this a good approach for getting the accuracy of a particular read? what do you suggest as a better alternative?

sequencing read quality 454 • 4.8k views

ADD COMMENT • link updated 13.1 years ago by Arun 2.4k • written 13.1 years ago by hadasa ★ 1.0k

score 4 · Answer 1 · 2012-07-17

One method I could think of is winsorisation to overcome the disadvantage due to mean's dependence on extreme values. You can find an implementation of winsorisation function in R here.

Also, you could have a look at the scaling normalization method for RNA-Seq paper from Robinson et al., that implements TMM (which is mean on data that is trimmed off x% of extreme values). I guess the basic idea is the same but it is rigorous enough to gain perspective. Other than that, I believe a simple winsorisation should be sufficient. Median is another alternative.