TrimGalore! on multiple paired fastq files
3
2
Entering edit mode
8.1 years ago
emblake ▴ 90

I have 60 PE fastq files that I would like to batch process using TrimGalore! I know a for...in loop would best serve my purpose, but I don't think I'm setting it up correctly. Would someone more experienced with scripting assist? Thank you!

File format: SMXX_R1_merged.fastq.gz, SMXX_R2_merged.fastq.gz

#!/bin/bash 
for f1 in *_R1_merged.fastq.gz 
do
        f2=${f1%%_R1_merged.fastq.gz}"_R2_merged.fastq.gz"
        trim_galore --illumina --paired --fastqc -o trim_galore/ $f1 $f2 
done
rna-seq trimgalore! paired-end script • 19k views
ADD COMMENT
3
Entering edit mode

You can run with GNU parallel.

find  path_to_fastq  -name "*_R1_merged.fastq.gz" | cut -d "_" -f1 | parallel -j 1 trim_galore --illumina --paired --fastqc -o trim_galore/ {}\_R1_merged.fastq.gz {}\_R2_merged.fastq.gz
ADD REPLY
1
Entering edit mode

You have a typo: *parallel ;-)

Might be best to include link as well: https://www.gnu.org/software/parallel/

ADD REPLY
0
Entering edit mode

I've installed GNU parallel and run:

find  /path/to/fastq  -name "*_R1_merged.fastq.gz" | cut -d "_" -f1 | parallel -j 1 trim_galore --illumina --paired --fastqc -o trim_galore/ {}\_R1_merged.fastq.gz {}\_R2_merged.fastq.gz

but it fails with:

gzip: /path/to/fastq/trim_R1_merged.fastq.gz: No such file or directory
Input file '/path/to/fastq/trim_R1_merged.fastq.gz' seems to be completely empty. Consider respecifying!

Path to Cutadapt set as: 'cutadapt' (default)
Cutadapt seems to be working fine (tested command 'cutadapt --version')
Failed to write to file 'trim_R1_merged.fastq.gz_trimming_report.txt': No such file or directory 1.11

I can see that the file naming convention is incorrect, but I'm not sure how to fix it.

ADD REPLY
0
Entering edit mode

Looks like the path is not correct. Are you sure /path/to/fastq is your directory that contains your gz files ? Can you print find /path/to/fastq -name "*_R1_merged.fastq.gz" | cut -d "_" -f1 ?

ADD REPLY
0
Entering edit mode

I checked the path, and there was an issue with a subfolder named 'trim_galore'. I corrected the error, and it seems to be executing just fine now. Thanks very much!

ADD REPLY
4
Entering edit mode
2.1 years ago
DareDevil ★ 4.3k

Try running:

ls *_1.fastq.gz | xargs -P15 -I@ bash -c 'trim_galore -q 20 --paired -o trimmed "$1" ${1%_1.*.*}_2.fastq.gz' _ @

This code will run 15 jobs at a time

ADD COMMENT
3
Entering edit mode
8.1 years ago
ole.tange ★ 4.5k
parallel trim_galore --illumina --paired --fastqc -o trim_galore/ {} {=s/_R1_/_R2_/=} ::: *_R1_merged.fastq.gz
ADD COMMENT
1
Entering edit mode
5.8 years ago

Alternatively, GNU parallel can easily handle multiple inputs:

parallel --xapply trim_galore --illumina --paired --fastqc -o trim_galore/ ::: *_R1_merged.fastq.gz ::: *_R2_merged.fastq.gz

Note that the xapply flag just runs each pair. If you do not include it, every combination of reads will be run between the two lists (not what you want).

ADD COMMENT

Login before adding your answer.

Traffic: 2156 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6