I'm using trimmomatic mainly to filter out adapters in the read through of my paired end illumina data.
My command is as follows, and produces expected results:
java -jar trimmomatic-0.33.jar PE 01_R1.fastq 01_R2.fastq 01_R1-trimpair.fastq 01_R1-trimunpair.fastq 01_R2-trimpair.fastq 01_R2-trimunpair.fastq ILLUMINACLIP:TruSeq3-PE.fa:2:20:7 TRAILING:3 MINLEN:36
However, I can't work out how to direct how many nodes to use (the word node or core doesn't exist in the trimmomatic manual) Edit: Found it under -threads. When run I am shown a message:
Multiple cores found: Using 16 threads
However, I have more available as I am submitting these jobs to a large compute cluster. If I assign 2 cores or 16 or 32, I still get the same message.
Finally, testing on one sample completed in 1000min wall time assigned to it (16 cores) and so I submitted the full 16 samples to the compute queue, but each job failed at ~50% completion as it timed out after 1000min. This makes me wonder if it's being limited by memory constraints, which when running alone it was able to inflate but with the parallel jobs running perhaps competed and slowed down. But that's speculation, I don't know if it works like that. Alternatively, could it be java that's limiting mem and I should push it higher with -Xmx?
Alternatively, I'm not tied to trimmomatic and would use a different illumina adaptor filter if anyone could recommend one.
Thanks.
I wonder if the limiting factor wouldn't be the I/O (reading/writing the fastq files) rather than the CPU (processing the reads) or the memory. I don't know for sure so it would be nice if someone could confirm this.