Linux for loop for 2 inputs and 4 outputs for Trimmomatic fastq quality trimming
3
3
Entering edit mode
9.5 years ago

I need help to write a for loop to run Trimmomatic tool for quality trimming of paired end fastq files.

Input: file01_R1.fastq, file01_R2.fastq.

The first file contains first read of paired end reads and the second file contains second reads. I have 100 input files in the same directory (50 files contain *e.R1g.fastq and another 50 files, file02_R1.R2fastq, file02_R2.fastq suffixes) and so on.

Output: file01_R1_PE.fastq, file01_R1_SE.fastq, file01_R2_PE.fastq, file01_R2_SE.fastq

PE = paired end
SE = single end

I need to write a for loop so that I can run an executable for all 100 files. Any help please! Thanks!

fastq • 13k views
ADD COMMENT
1
Entering edit mode

Hello Rashedul Islam!

We believe that this post does not fit the main topic of this site.

This is not a question about bioinformatics, but about programming loops. You may find answers on Stack Overflow. I'd recommend also specifying which language or shell you are working with.

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLY
2
Entering edit mode

Can we reopen this one? It's still a beginner question but relation to bioinformatics is specific enough. Running Trimmomatic on both PE and SE data has its own complications (need to differentiate PE from SE and chose correct adapter set).

ADD REPLY
1
Entering edit mode

Opening this question again for answers would be good.

ADD REPLY
1
Entering edit mode

I have edited my question. Please give a look.

ADD REPLY
6
Entering edit mode
9.0 years ago

Don't use for loops in bash, or possibly in any programming language.

Have a look at GNU/parallel:

find yourfastqfolder -iname 'file*_R1.fastq' | parallel -j 4 'trimmomatic_command {} {.}_R2.fastq'

This will launch all the jobs in parallel, in bunches of 4. Have a look at Gnu Parallel - Parallelize Serial Command Line Programs Without Changing Them for more examples.

ADD COMMENT
0
Entering edit mode

Thanks, this might come in handy. I am wondering, though, what is wrong with for loops? Care to elaborate?

ADD REPLY
5
Entering edit mode
9.3 years ago

Finally, I used this code to run Trimmomatic in a loop in linux.

for file in /path/to/*_R1.fastq
do
withpath="${file}"
filename=${withpath##*/}
base="${filename%*_*.fastq}"
echo "${base}"
java -jar trimmomatic-0.33.jar PE /path/to/"${base}"*_R1.fastq /path/to/"${base}"*_R2.fastq /path/to/"${base}"_R1.trimmed_PE.fastq /path/to/"${base}"_R1.trimmed_SE.fastq /path/to/"${base}"_R2.trimmed_PE.fastq /path/to/"${base}"_R2.trimmed_SE.fastq ILLUMINACLIP:/path/to//Trimmomatic-0.33/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 HEADCROP:14
done
ADD COMMENT
1
Entering edit mode

I know its a bit late. But above script won't be able to distinguish between lets say sample1_R1.fastq and sample12_R1.fastq.

Above script might get confused whether to select sample1 and sample12.

ADD REPLY
0
Entering edit mode

I think that you are now running all the files - SE and PE - through PE processing with PE adapter trimming only, but the the SE reads should be trimmed using SE processing using TruSeq-SE.fa adpater file. Therefore you should build in a conditional processing step depending on if the filename contains PE or SE.

ADD REPLY
0
Entering edit mode
7.5 years ago
pmiller ▴ 10

This is a late response but for anyone else who gets stuck, I finally found a way to do this for myself. The use of Gnu/parallel with "find" was a big fail for me - would not run. Then I finally realized after much searching that Trimmomatic does not have any kind of batch mode and that was the problem. I don't know why I did not understand this, but I found a workaround with a loop that is awesome at - http://www.datacarpentry.org/2015-08-24-ISU/lessons/06-readQC.html. I used it for a different tool in Trimmomatic (CROP) and it worked beautifully.

$ for infile in *.fastq
do
    outfile=$infile\_trim.fastq
    java -jar ~/Trimmomatic-0.32/trimmomatic-0.32.jar SE $infile $outfile SLIDINGWINDOW:4:20 MINLEN:20
done
ADD COMMENT

Login before adding your answer.

Traffic: 1752 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6