Bbduk filters away good reads
1
0
Entering edit mode
5.6 years ago
dt ▴ 30

Hi,

I seem to be missing something obvious, but I have a read (actually lots of reads, it's just an example) that shouldn't be filtered by bbduk.sh based on average quality, but it is.

Read:

@NZ_AP014881.1_0_0/2
TGCAGCATTCTCCTGATGGCGGTCTTGATGAAGAGCTTTGTTACGGGGGTCATCCTCATCCATCAGGTCTGGTGCAGAAATAAAGCGCAAAGGCTTGGGGTTACCGCCTGCGCGCATGTCTGCCAGCATATCCTCTAGCGCGGCAGGCTCTGGGCAATTAATCTCAATCTGCTGACGGTCAGACTTTGGCAAATTGAGCAGGCGGTTGCGGGCCGACGTATCCAAAAGGCGATTGCACCAGCGCTGAACACGATATCCCGGACGATCTGGTAGCTGTTCTTCTTCCAGTTCCTCACGCAGA
+
<CCCCEGG6GGGGF,GEGGGC,FFGFFFCEGGCGGGGG@GGFCGGC<FCGGGGEFGFGGF@GCG,GGG@GGGGGG<GEGFGFGGGEGGFAEEC<GGGGFFFEECD<EFGGF8<G5GGGCDBGFFGEG<GCGFECG+FCGF,CC=,*DF5=,9<7C4E:EF,,=B@CGF>:GGECGG;;8G>C1,:,C:+E,9,FF<@,*6:,793G,4+13G*7*;3*=@6C7/85+C59C+<>***2C)*1*/**))<+<2)+**)4/)A)>)1+2)**51065.:091>1***0*)*).0+*(*2*90.

Command:

bbduk.sh in1=test.fq out1=test_out.fq maq=20

Bbduk version is Version 38.46. Average quality of the read seems to be ~27. Hope someone can help, thanks in advance.

next-gen quality control preprocessing filtering • 2.2k views
ADD COMMENT
0
Entering edit mode

could it be that the reads underwent some trimming causing it to fail under the maq threshold?

minavgquality=0 (maq) Reads with average quality (after trimming) below this will be discarded.

ADD REPLY
0
Entering edit mode

Not really, I provided the exact read and the exact command to reproduce the problem. Can you reproduce it?

ADD REPLY
3
Entering edit mode
5.6 years ago
dt ▴ 30

Ok, I believe I figured it out, it's explained in https://github.com/wdecoster/NanoPlot/issues/57:

BBDuk calculates average quality score by converting to probability scale, taking an average, and then converting back to Phred scale. So for example, a 2bp read with quality scores 10 and 20 would yield an average quality of (0.9+0.99)/2=0.945 -> Q12.6 rather than Q15 with a linear average.

Essentially it means that, looking e.g. at the seqtk fqchk output, bbduk uses the value calculated in the errQ field rather than the avgQ field. I believe, this can be confusing and should be mentioned in the Bbduk documentation. If someone knows the developer, maybe you can let him know? Thanks.

ADD COMMENT
3
Entering edit mode

I believe, this can be confusing

This is not confusing at all. Please make yourself familiar with the mathematics of Q values.

Q values represent a logarithmic transformation of the error rates. Logarithmic transformations are often used in science and engineering, but have some pit falls, especially when it comes to calculate the "mean". Why do you want to calculate the arithmetic mean of Q values? The arithmetic mean of Q values is equivalent to the geometric mean of the error rates, which is most likely not what you want.

ADD REPLY
0
Entering edit mode

nice, and that indeed probably explains iit.

for suggestions and comments on the BBTools package you can find a link on their webpage: https://jgi.doe.gov/data-and-tools/bbtools/bbtools-faq-support-forums/

ADD REPLY

Login before adding your answer.

Traffic: 1539 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6