I seek help understanding these 2 flags for BBDUK of BBMAP = 'tossbrokenreads' and 'nullifybrokenquality'
I see these flags mentioned in the STDERR of my bbduk.sh step using BBMap version 38-60 while decontaminating Illumina SE 100nt raw reads via "Adapter and Quality Trimming" - please see a relevant block of the STDERR copy-pasted below
[E::bgzf_read] Read block operation failed with error 2 after 58624 of 65536 bytes
Error 3 in block starting at offset 1321362048(4EC26280)
java.lang.Exception:
Mismatch between length of bases and qualities for read 17377414 (id=HWI-ST797:117:D091UACXX:4:1303:5955:45869 1:).
# qualities=27, # bases=101
CCCFFFFFHHHHHJJIIJJJIJIEIHJ
TTCCCGATCATCCCGAGAAGGAACGTCTGCCATAATCTTCTCCTGACCGCGCCAAAGAATTTTGTCAATGACCCCAAATTCCTTAGCCAATAATGCGTCCA
This can be bypassed with the flag 'tossbrokenreads' or 'nullifybrokenquality'
at shared.KillSwitch.kill(KillSwitch.java:96)
at stream.Read.validateQualityLength(Read.java:214)
at stream.Read.validate(Read.java:104)
at jgi.BBDuk$ProcessThread.run(BBDuk.java:2418)
However, the bbduk.sh help menu does not have these exact flags (too long to fully copy / paste here), the closest flag I see is tossjunk=f
. Therefore, I'm
A. confused about these messages,
B. curious when and why I would call these flags, and
C. why I receive these error messages - do they imply corrupted reads in my FASTQ input?
Could forum members please help? Thanks!
Those two options are not available in
bbduk.sh
so this seems to be a case of bbduk not printing correct error fix message. This you could point out to Brian by creating a ticket here.Your data appears to have become corrupted at some step. Hopefully this may be a transient issue which you can verify by rerunning the sample through your pipeline again.
How did you perform 'decontamination'?
The error is clearly, that the base-string's length differs from its associated quality-string length.
Personally, I would rather investigate the problem than trying to solve it by bbmap.
I agree, Michael.
Here are my steps including and leading to the BBDUK decontamination step(s):
Your advice on how to "investigate" the underlying problem(s)?
You have the read ID, check in each step if this asynchronous base/quality ratio appeared. Try to find the read in the original sra file.
Check with a simple script if this is the only case. If it was introduced in one of your steps, try to reproduce the error. If the error is reproducible, contact the BB crew.