How to cat 2 paired end reads files together in one file using loops for multiple libraries ?
3
0
Entering edit mode
3.5 years ago

Hello let's say I have a directory that have multiple paired end reads and want to create a file containing the corresponding pairs, so this is something like this:

cat R1.fastq,gz R2.fastq.gz > R.fastq.gz

but given the fact that I have multiple libraries, doing this manually could take a lot of time, so how can I automatize this task to all my libraries using for loop ?

I tried with the below code but it stores all the multiple files in each output file:

R1='*_1.fastq.gz'
R2='*_2.fastq.gz'
for i in *_1.fastq.gz
do
base=$(basename $i "_1.fastq.gz")
cat $R1 $R2 > ${base}.fastq.gz
done

Thanks for reading :)

PD: IMPORTANT: this way of concatenating reads is needed for MASH program, for other programs (e.g some assembly programs) the best thing to save this files is using interleave formats.

paired-end-reads cat loops bash • 5.1k views
ADD COMMENT
7
Entering edit mode
3.5 years ago
ATpoint 85k
for i in *_1.fastq.gz
  do
  base=(basename $i "_1.fastq.gz")
  cat ${base}_1.fastq.gz ${base}_2.fastq.gz > ${base}.fastq.gz
  done

Does that make sense to you?

ADD COMMENT
0
Entering edit mode

this worked, thanks :)

ADD REPLY
2
Entering edit mode
3.5 years ago
GenoMax 147k

You should not be concatenating paired-end files in an end-to-end fashion this way. Tools are not going to be able to understand these files and you will likely end up with erroneous results. You could interleave the reads if you want to create a single file per sample. That can be achieved using BBMap suite. Not all tools understand interleaved data files. So keep that in mind.

reformat.sh in1=sample_R1.fq.gz in2=sample_R2.fq.gz out=sample.fq.gz 
ADD COMMENT
0
Entering edit mode

Oh thanks, I was thinking the same, but the program's tutorial thay I'm following suggests to concatenate the reads in that way: https://mash.readthedocs.io/en/latest/tutorials.html

but I got your idea that for other programs , the ideal thing to do is to interleave the reads as you said,

thanks :)

ADD REPLY
0
Entering edit mode

Thanks for clarifying that.

ADD REPLY
0
Entering edit mode
2.1 years ago
ben@f ▴ 20
find -type f -name "*.fastq.gz" | xargs  cat >> basename.fastq.gz

This command should be able to concatenate multiple paired-end files, even with a different basename. But you must specify the directory.

ADD COMMENT

Login before adding your answer.

Traffic: 1707 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6