Hello,
qtrim is defined as
qtrim=f Trim read ends to remove bases with quality below trimq.
Performed AFTER looking for kmers.
Values:
rl (trim both ends),
f (neither end),
r (right end only),
l (left end only),
w (sliding window).
What does it mean both ends, both ends of a read or both ends meaning forward and reverse read?
I'm asking this because using qtrim=r
, launched with the command:
/Shared/002-Software/bin/trimming.py/libs/bbmap/bbduk.sh
-Xmx1g
in=sequentia20160330T145739/Arancia_S1_R1_001.fastq.gz
in2=sequentia20160330T145739/Arancia_S1_R2_001.fastq.gz
out=Trimmed_output_default/Trimmed_files/Arancia_S1_R1_001.trimmed.fq.gz
out2=Trimmed_output_default/Trimmed_files/Arancia_S1_R2_001.trimmed.fq.gz
minlength=35
ziplevel=1
trimq=25
hdist=1
threads=1
overwrite=true
mink=11
ref=/Shared/002-Software/bin/trimming.py/libs/adapters.fa
qtrim=r
k=31
ktrim=r
ftl=0
ftr=0
ftr2=0
gives the following results:
the image above is composed of reads with qualities like these:
6--A-<-<C-,,,6++@76+8,C,6C,6+6+8,,,,C,6CF,,C,FE,,,,,,,,+:6@++8CC,,669,:,+,:6>,,,=,4,7,,,,9<,,:,,4,,,9,4+4+A,A,C4,,,++9,,C=+@4@B+8+636,@,,,,,7,8,,,2,6,+14+++++0**0<****52+**3*;99*2;F*/8))/))01)4+)))*)+;A+2;>5)0)))22))0).)))))).)*)()()))*)*02**)).06))21959***2:*)))))0))/*/**11**))0/**1+3+2<C,88A;CE
or
CCCC9;C,,C8@A,EEC8,CF99,6,,,<66,CFF,;,,,;,CF,CFC,6CC,,6,,,6,`<CFFGE,;6,,,,BE@FGDE5F<,,,,,,,:AFEFG7,5?,,A,E;:,+,,,,44,+C,,,,,+6=@8@D,,9@,9,,,,,99@+6=;,,,6,,6,,7++,3662+=DF?++=++63++9++*;**;:0*3****0+1*03*3*3+*1=6?)/)*+++*++++++=AG;)+)*+1+1+;CD<AAEFF@>D
and using qtrim=rl
, it gives the following ones:
both ends of a read. If you are using PE reads then both ends of both reads with rl option.
So, when using paired-end reads with option
qtrim=r
, will it trim the 3' ends of both reads?If trimming is needed, yes.
so I don't understand why I get some results... (see updated post)
Can you provide full bbduk command options you are using? Are you also scanning for adapters and trimming them?
The command has been added to the post. And yes, I look for adapters (in fact, there's a high level of adapters at the 3' end of this dataset).
@Brian recommends adding
tbe
andtpo
options with normal paired-end reads. Can you add those and try a re-run? In the qtrim=r the trimq does not appear to be working since you have Q-scores below 25.Trimming at 25 is very stringent. Generally 5 (or 10) is sufficient for re-sequencing data. Only if you are doing de novo assembly then you would want to be more stringent.
I repeated the trimming with
tbe=t
andtpo=t
and it gives the very same results.Are there any other things flagged by FastQC? Are you able to attach a full report? Have you looked at an uncompressed version of the FastQC graphs to see if there are specific cycles that have an issue? Does the per tile sequence quality plot show any uneven splotches of color?
Tagging Brian Bushnell. He may have additional comments.
I am assuming for each pair, the top graph is before quality trimming and the bottom graph is after quality trimming. Are they for read 1 or read 2? Or both?
Anyway, the results are strange and not what I would expect, so I am investigating and I'll get back to you. It looks like there is a special case when the entire read is extremely low quality except for the 3' end (which is a very unusual patter for Illumina reads) that is not being correctly handled by qtrim=r. However, without the "tbo" flag, you can get odd-looking results when adapter-trimming, particularly when most of the reads are short-insert. The reason is that adapter kmers will be found in the high-quality reads, and those reads will be trimmed. The adapter kmers in low-quality reads will have too many errors and thus not be found. So, if all of your reads have an insert size of around 200 bp, then the high-quality reads will be trimmed to 200 bp and the low-quality reads will not be adapter-trimmed (since they have too many errors to match the adapter sequence). So reads over 200bp will be highly enriched for low quality. Thus, it will look like the quality dropped in that region, but what actually happened is that the high-quality adapter bases were selectively trimmed. For this dataset, I strongly recommend adding the flags "tbo tpe" and using hdist=2 instead of 1. Also, it looks like you should use qtrim=rl until I address the special case where the entire read aside from the 3' end is low-quality.