Question

What is the limiting factor for trimmomatic speed and how can it be increased?

2

Entering edit mode

7.7 years ago

Daniel ★ 4.0k

I'm using trimmomatic mainly to filter out adapters in the read through of my paired end illumina data.

My command is as follows, and produces expected results:

java -jar trimmomatic-0.33.jar PE 01_R1.fastq 01_R2.fastq 01_R1-trimpair.fastq 01_R1-trimunpair.fastq 01_R2-trimpair.fastq 01_R2-trimunpair.fastq ILLUMINACLIP:TruSeq3-PE.fa:2:20:7 TRAILING:3 MINLEN:36

However, I can't work out how to direct how many nodes to use (the word node or core doesn't exist in the trimmomatic manual) Edit: Found it under -threads. When run I am shown a message:

Multiple cores found: Using 16 threads

However, I have more available as I am submitting these jobs to a large compute cluster. If I assign 2 cores or 16 or 32, I still get the same message.

Finally, testing on one sample completed in 1000min wall time assigned to it (16 cores) and so I submitted the full 16 samples to the compute queue, but each job failed at ~50% completion as it timed out after 1000min. This makes me wonder if it's being limited by memory constraints, which when running alone it was able to inflate but with the parallel jobs running perhaps competed and slowed down. But that's speculation, I don't know if it works like that. Alternatively, could it be java that's limiting mem and I should push it higher with -Xmx?

Alternatively, I'm not tied to trimmomatic and would use a different illumina adaptor filter if anyone could recommend one.

Thanks.

trimmomatic fastq • 13k views

ADD COMMENT • link 7.7 years ago by Daniel ★ 4.0k

1

Entering edit mode

I wonder if the limiting factor wouldn't be the I/O (reading/writing the fastq files) rather than the CPU (processing the reads) or the memory. I don't know for sure so it would be nice if someone could confirm this.

ADD REPLY • link 7.7 years ago by Carlo Yague 8.9k

score 3 · Answer 1 · 2017-03-09

BBDuk is substantially faster than Trimmomatic (and, in my testing, more accurate for adapter-trimming). With 16 cores, it can adapter-trim over 1 million 150bp paired-end reads per second on 2.5 GHz Intel E5-2670 CPUs, using recommended parameters.

E.G.:

bbduk.sh in=/dev/shm/r#.fq reads=4m ktrim=r k=23 mink=11 hdist=1 t=16 ref=adapters_a2.fa tbo tpe out=foo.fq

BBDuk version 37.02
Set threads to 16

Initial:
Memory: max=46902m, free=44944m, used=1958m

Added 7767 kmers; time:         0.225 seconds.
Memory: max=46902m, free=42497m, used=4405m

Input is being processed as paired
Processing time:                3.517 seconds.

Input:                          4000000 reads           604000000 bases.
KTrimmed:                       10626 reads (0.27%)     1176820 bases (0.19%)
Trimmed by overlap:             1658 reads (0.04%)      25632 bases (0.00%)
Total Removed:                  6422 reads (0.16%)      1202452 bases (0.20%)
Result:                         3993578 reads (99.84%)  602797548 bases (99.80%)

Time:                           3.755 seconds.
Reads Processed:       4000k    1065.30k reads/sec
Bases Processed:        604m    160.86m bases/sec

score 2 · Answer 2 · 2017-03-09

2

Entering edit mode

7.7 years ago

Petr Ponomarenko ★ 2.8k

So you tried -threads with 2, 16 and 32? Got same results? Trimmomatic analyses each read separately (or each pair with paired end settings) and analyzed reads do not affect trimming or subsequent reads if I remember correctly. Then to avoid I/O problem you can slice your file into chunks of N reads and send it to K nodes in a batch, then wait for everything to be processed and combine results. It does sound like trimmomatic is not super efficient in parallelization on your cluster for some reason.

ADD COMMENT • link 7.7 years ago by Petr Ponomarenko ★ 2.8k

0

Entering edit mode

I was going to suggest slicing the input fastq's. However, the slicing & merging makes the whole processing more complicated and error prone and I wonder if it's worth. If time is crucial I would consider also/instead piping the output of the trimmer to the aligner. I put simple howto here Trim & align paired-end reads in a single pass using cutadapt and bwa mem.

ADD REPLY • link 7.7 years ago by dariober 15k