bash script for catenating many paired end files
1
0
Entering edit mode
4.4 years ago
dpc ▴ 250

Hi friends!!! I have around 300 paired-end files from 150 samples (1 forward and 1 reverse read for each sample). I want to catenate respective forward and reverse reads for each of the samples. The samples are named like: SRR2155174_1.fastq SRR2155174_2.fastq SRR2155319_1.fastq SRR2155319_2.fastq. Can anyone please write me a bash script with loop or something so that they can be catenated automatically. I'm not much familiar with bash script.

Thanks and Regards,

DC7

catenation • 1.5k views
ADD COMMENT
2
Entering edit mode
cat *_1.fastq > concatenated_1.fastq
cat *_2.fastq > concatenated_2.fastq

should do it

ADD REPLY
1
Entering edit mode

When you say catenate do you mean you want to combine all forwards and all reverse reads into a single forward and a single reverse file, or do you want to combine R1 and R2 for each sample to get an interleaved fastq? Please try something, show the code, we will be happy to debug and guide you.

ADD REPLY
0
Entering edit mode
find . -type f -name "*_1.fastq" | while read line; do echo "pandaseq -F -f $line -r ${line/1.fastq/2.fastq} 1 > ${line/1.fastq/merged.fastq} 2 > ${line/1.fastq/merge_stats.txt}"; done

or

parallel --dry-run 'pandaseq -F -f {} -r {=s/_1/_2/=} 1 > {=s/_1.fastq//=}_merged.fastq 2 > {=s/_1.fastq//=}.merged_stats.txt' ::: *_1.fastq
ADD REPLY
1
Entering edit mode
4.3 years ago
Mark ★ 1.6k

I would suggest you use a toolkit like BBtools to perform this. This option is referred to as interleaving fastq files:

https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/reformat-guide/

reformat.sh in1=read1.fq in2=read2.fq out=reads.fq

The above answer by user cpad0112will allow you to automate this action across the whole dataset.

ADD COMMENT

Login before adding your answer.

Traffic: 2312 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6