What tool do you use for removing adapters and low-quality portions of reads and why?
What tool do you use for removing adapters and low-quality portions of reads and why?
Our facility uses Trimmomatic as it performs adapter trimming and various other read filtering/trimming functions. It is pair-aware maintaining paired filtered reads whilst removing singletons. It is simple to use and reasonably fast.
This thread won't be complete without bbduk and seal(which is part of BBMap suite). Written in pure java will work on PC/Mac/*nix.
Best ? I don't know but there is lot of tools fairly comprehensive and efficient.
See this answer
It could be interesting to have a comparaison of the different tools in order to assess their main differences.
Table 5.1 here could be a good start.
Personally I'm happy with cutadapt although I haven't explored other options, apart from trim_galore which is a wrapper around cutadapt. Things I like of cutadapt:
About quality trimming, these days quality is very high up to 150+ bases so I usually skip it altogether.
I superficially compared cutadapt with the trimmer that comes with the pipeline in Illumina/Basespace and the results where very similar, I think Basespace's trimmer was a little more aggressive, but essentially same results.
Our group at the Max Planck wrote a paper comparing the accuracy of various trimming algorithm. Our Bayesian trimmer leeHom (http://grenaud.github.io/leeHom/) outperformed other algorithms in terms of accuracy and very favorably in terms of speed:
It achieves merger of overlapping portions and detection of chimeric reads. You do not need cutoffs for % of matches etc and it eats fastq and BAM.
For the low quality, I would not recommend cutting reads at they will be harder to map but you remove low quality reads, I coded something a while back.
It uses BAM files and can filter on the exp. number of mismatches and sequence entropy.
Hope this helps!
You also need to take into account the possibility of run the trimmer in parallel (using several cores) and considering how fast it works actually
For example. In a 20Gb genomic fastq sequence (only one of the paired ends), a run with fastx-toolkit can take almost a day or even more to accomplish, whereas seqtk requires a dozen or minutes or so
FASTP is really good to process FASTQ files and you can use the below command
fastp -i SRR098401_1.fastq.gz -I SRR098401_2.fastq.gz -o SRR098401_1_QC.fastq.gz -O SRR098401_2_QC.fastq.gz -g -x -p
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
It's also important to note what these tools can do, as per the original question. BBDuk does adapter-removal and quality-trimming in a single pass, faster than anything else; seqtk cannot perform adapter-removal.
Accuracy is also worth mentioning... since, in my opinion, it is generally more important than speed. BBDuk and seqtk have equal accuracy for quality-trimming, as they use the same algorithm. Anyway, that's a solved problem - the algorithm is optimal and cannot be improved. Everything else I've tested (which is everything commonly-used - trimmomatic, fastx, etc) is dramatically inferior, since it uses a non-optimal algorithm. Of course, quality-trimming is a dark art and often it's not even a good idea, but if you DO do it, you should do it correctly. Adapter-trimming, on the other hand, is ALWAYS a good idea (if done correctly).
BBDuk is the best adapter-trimming software available, by a huge margin. Let's put speed aside - it's the fastest, but that is never a good enough reason to use software unless more-accurate software is fundamentally too slow to use.
I have verified through extensive synthetic testing that BBDuk has a greater true-positive and lower false-positive rate of adapter-removal than anything else. In fact, those results were presented at AGBT a couple weeks ago. Maybe someone here attended?
In summary - I do not know of any useful read-trimming or read-filtering operation in which BBDuk is not the best tool in both accuracy and speed, aside from seqtk, which can exceed BBDuk's speed on single-ended reads.