Best trimming tool
7
2
Entering edit mode
8.8 years ago
int11ap1 ▴ 490

What tool do you use for removing adapters and low-quality portions of reads and why?

trimming • 26k views
ADD COMMENT
1
Entering edit mode
8.8 years ago
Ian 6.1k

Our facility uses Trimmomatic as it performs adapter trimming and various other read filtering/trimming functions. It is pair-aware maintaining paired filtered reads whilst removing singletons. It is simple to use and reasonably fast.

ADD COMMENT
1
Entering edit mode
8.8 years ago
GenoMax 147k

This thread won't be complete without bbduk and seal(which is part of BBMap suite). Written in pure java will work on PC/Mac/*nix.

ADD COMMENT
0
Entering edit mode
8.8 years ago
Juke34 8.9k

Best ? I don't know but there is lot of tools fairly comprehensive and efficient.

See this answer

It could be interesting to have a comparaison of the different tools in order to assess their main differences.

Table 5.1 here could be a good start.

ADD COMMENT
0
Entering edit mode
8.8 years ago

Personally I'm happy with cutadapt although I haven't explored other options, apart from trim_galore which is a wrapper around cutadapt. Things I like of cutadapt:

  • Fast enough, easy to use, flexible in how/what you want to trim and what to get back
  • Great documentation, well maintained.
  • Write to stdout so you can stream through bwa (or else) without writing massive files to disk
  • Recent releases: Read and write interleaved paired-end reads which can also be streamed to bwa

About quality trimming, these days quality is very high up to 150+ bases so I usually skip it altogether.

I superficially compared cutadapt with the trimmer that comes with the pipeline in Illumina/Basespace and the results where very similar, I think Basespace's trimmer was a little more aggressive, but essentially same results.

ADD COMMENT
0
Entering edit mode
8.8 years ago
Gabriel R. ★ 2.9k

Our group at the Max Planck wrote a paper comparing the accuracy of various trimming algorithm. Our Bayesian trimmer leeHom (http://grenaud.github.io/leeHom/) outperformed other algorithms in terms of accuracy and very favorably in terms of speed:

It achieves merger of overlapping portions and detection of chimeric reads. You do not need cutoffs for % of matches etc and it eats fastq and BAM.

For the low quality, I would not recommend cutting reads at they will be harder to map but you remove low quality reads, I coded something a while back.

It uses BAM files and can filter on the exp. number of mismatches and sequence entropy.

Hope this helps!

ADD COMMENT
0
Entering edit mode
8.8 years ago

You also need to take into account the possibility of run the trimmer in parallel (using several cores) and considering how fast it works actually

For example. In a 20Gb genomic fastq sequence (only one of the paired ends), a run with fastx-toolkit can take almost a day or even more to accomplish, whereas seqtk requires a dozen or minutes or so

ADD COMMENT
3
Entering edit mode

It's also important to note what these tools can do, as per the original question. BBDuk does adapter-removal and quality-trimming in a single pass, faster than anything else; seqtk cannot perform adapter-removal.

Accuracy is also worth mentioning... since, in my opinion, it is generally more important than speed. BBDuk and seqtk have equal accuracy for quality-trimming, as they use the same algorithm. Anyway, that's a solved problem - the algorithm is optimal and cannot be improved. Everything else I've tested (which is everything commonly-used - trimmomatic, fastx, etc) is dramatically inferior, since it uses a non-optimal algorithm. Of course, quality-trimming is a dark art and often it's not even a good idea, but if you DO do it, you should do it correctly. Adapter-trimming, on the other hand, is ALWAYS a good idea (if done correctly).

BBDuk is the best adapter-trimming software available, by a huge margin. Let's put speed aside - it's the fastest, but that is never a good enough reason to use software unless more-accurate software is fundamentally too slow to use.

I have verified through extensive synthetic testing that BBDuk has a greater true-positive and lower false-positive rate of adapter-removal than anything else. In fact, those results were presented at AGBT a couple weeks ago. Maybe someone here attended?

In summary - I do not know of any useful read-trimming or read-filtering operation in which BBDuk is not the best tool in both accuracy and speed, aside from seqtk, which can exceed BBDuk's speed on single-ended reads.

ADD REPLY
0
Entering edit mode
4.9 years ago

FASTP is really good to process FASTQ files and you can use the below command

fastp -i SRR098401_1.fastq.gz -I SRR098401_2.fastq.gz -o SRR098401_1_QC.fastq.gz -O SRR098401_2_QC.fastq.gz -g -x -p
ADD COMMENT

Login before adding your answer.

Traffic: 2302 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6