I've been using Trimmomatic by using LEADING, TRAILING and SLIDING WINDOW parameters. Recently we have seen that ILLUMINACLIP parameter is also being used mostly. Until we have seen this, we have been thinking that Trimmomatic is built to trim Illumina adapters specifically, therefore it would detect Illumina adapters and trim them automatically. But there is nothing to indicate that it happens such way.
1) What makes Trimmomatic special about Illumina data? Accessing to adapters seems quite easy, so another tool could be used too for Illumina adapters.
2) What are your suggestions to usage of Trimmomatic? Should we always use ILLUMINACLIP parameter, or should we use it when "Overrepresented Sequences" shows the adapters in FastQC result? If it should be used, what are the default parameter values? We have seen that <adaptername>:2:30:10 is generally used, but we would like to hear your options.
Thanks, I will start to use
ILLUMINACLIP
. What about theTRAILING
andMINLEN
parameters? Is it suitable to useILLUMINACLIP
,TRAILING
andMINLEN
parameters for the high quality fastq's only? Due to having high quality scores, we have thought these parameters can be used like that. (TRAILING:30 MINLEN:50
)Ok, so I personally would use the
SLIDINGWINDOW:5:20
parameter since it results in a more precise clipping of bases which were called with low quality. In this case, it slides over the sequences beginning at the 5'end and looks at a window of 5 bases. If the average phred score is lower 20, it cuts the reads sequence.If you use the
TRAILING:30
parameter, trimmomatic just clips all bases at the reads 3' end with a phred score lower 30. That way, the clipping would directly stop, if there is one single base with a higher phred score. The SLIDINGWINDOW parameter would overlook the single base and also take into account the following 4 bases. It is also recommended in the manual of trimmomatic to useSLIDINGWINDOW
instead ofTRAILING
.The
MINLEN:50
argument seems fine to me, if your read length is around 100-150nt. If your are looking at splicing, however, it has been shown to be benefical to use a minimum read length of 75nt.I'm not sure if I understood youre queston correctly, but I would of corse process all FASTQ files the same and only remove samples, if for example there is a very high amount of rRNA contamination or something like that.