I am currently trying to follow a scRNA-seq pipeline, however I have encountered a couple of problems.When trying to trim a paired-end read sample to remove reads with a quality score below 20 using trim galore, for one particular sample it is has been running for over 12 hours and has only done 4 out of the 19 paired end reads. Is there a reason for this, or any way that we can speed up this process? The fastq files are really large for these samples - some are over 5 million KB, and there are 19 paired reads files. Other samples worked fine before and other tools do not work properly, only trim galore works well.
for i in *_1.fastq.gz;
do
trim_galore
-q 20
--paired
-o trimmed “$i” “${i%_1.fastq.gz}_2.fastq.gz“;
done
I find that hard to believe. There are threaded trimming tools (e.g.
bbduk.sh
from BBMap suite) that will work as fast as your disk I/O allows and number of cores you have available. That said if your computer is I/O bound (e.g. you are using a regular spinning disk) then things may already be at their peak limits.In addition to the solution below you can also look into using
parallel
: Gnu Parallel - Parallelize Serial Command Line Programs Without Changing Them