bbduk flags 'tossbrokenreads' and 'nullifybrokenquality'
2
2
Entering edit mode
5.2 years ago
Anand Rao ▴ 640

I seek help understanding these 2 flags for BBDUK of BBMAP = 'tossbrokenreads' and 'nullifybrokenquality'

I see these flags mentioned in the STDERR of my bbduk.sh step using BBMap version 38-60 while decontaminating Illumina SE 100nt raw reads via "Adapter and Quality Trimming" - please see a relevant block of the STDERR copy-pasted below

[E::bgzf_read] Read block operation failed with error 2 after 58624 of 65536 bytes
Error 3 in block starting at offset 1321362048(4EC26280)
java.lang.Exception: 
Mismatch between length of bases and qualities for read 17377414 (id=HWI-ST797:117:D091UACXX:4:1303:5955:45869 1:).
# qualities=27, # bases=101

CCCFFFFFHHHHHJJIIJJJIJIEIHJ
TTCCCGATCATCCCGAGAAGGAACGTCTGCCATAATCTTCTCCTGACCGCGCCAAAGAATTTTGTCAATGACCCCAAATTCCTTAGCCAATAATGCGTCCA

This can be bypassed with the flag 'tossbrokenreads' or 'nullifybrokenquality'
    at shared.KillSwitch.kill(KillSwitch.java:96)
    at stream.Read.validateQualityLength(Read.java:214)
    at stream.Read.validate(Read.java:104)
    at jgi.BBDuk$ProcessThread.run(BBDuk.java:2418)

However, the bbduk.sh help menu does not have these exact flags (too long to fully copy / paste here), the closest flag I see is tossjunk=f . Therefore, I'm

A. confused about these messages,

B. curious when and why I would call these flags, and

C. why I receive these error messages - do they imply corrupted reads in my FASTQ input?

Could forum members please help? Thanks!

BBMap BBduk flags • 3.7k views
ADD COMMENT
1
Entering edit mode

Those two options are not available in bbduk.sh so this seems to be a case of bbduk not printing correct error fix message. This you could point out to Brian by creating a ticket here.

Your data appears to have become corrupted at some step. Hopefully this may be a transient issue which you can verify by rerunning the sample through your pipeline again.

ADD REPLY
0
Entering edit mode

How did you perform 'decontamination'?

The error is clearly, that the base-string's length differs from its associated quality-string length.

Personally, I would rather investigate the problem than trying to solve it by bbmap.

ADD REPLY
2
Entering edit mode

I agree, Michael.

Here are my steps including and leading to the BBDUK decontamination step(s):

rename.sh in=$IN out=$OUT fixsra=t -Xmx64g # from release 38.61, all other steps from release 38.60
IN=$OUT
clumpify.sh -Xmx64g in=$IN out=$OUT dedupe optical
IN=$OUT
bbduk.sh -Xmx64g in=$IN out=$OUT ktrim=r k=23 mink=11 hdist=1 tbo tpe minlen=70 ref=adapters ftm=5 ordered
IN=$OUT
bbduk.sh -Xmx64g in=$IN out=$OUT k=31 ref=artifacts,phix ordered cardinality

Your advice on how to "investigate" the underlying problem(s)?

ADD REPLY
1
Entering edit mode

You have the read ID, check in each step if this asynchronous base/quality ratio appeared. Try to find the read in the original sra file.

Check with a simple script if this is the only case. If it was introduced in one of your steps, try to reproduce the error. If the error is reproducible, contact the BB crew.

ADD REPLY
0
Entering edit mode
4.1 years ago
hegyihedi • 0

Well, you should just try the flag, even if it is not listed in the help menu. I had the same problem, used the flag as it is recommended and it worked:

I used this command: bbduk.sh in=1.fastq out=clean1.fastq outm=1.fastq.outm tossbrokenreads=t 2>log.1.fastq

and it solved the problem!

ADD COMMENT
0
Entering edit mode
3.1 years ago

I had this same issue and noticed that the reads that failed contained characters from Phred-33 that are not part of 64 (see encoding table). The issue was resolved by clarifying qin=33.

I find the lack of documentation on 'tossbrokenreads' or 'nullifybrokenquality' disturbing.

ADD COMMENT

Login before adding your answer.

Traffic: 1961 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6