bash script for merging R1 and R2 reads
2
0
Entering edit mode
2.4 years ago
ieie ▴ 10

Hi,

I am sure this question was already asked, although I could not find any information, I am new to sequences processing and I would like to know if there is a way to merge fastq.gz files:

I have this set of files:

CACTCTGAGT_S7_L001_R1_001.fastq.gz CACTCTGAGT_S7_L001_R2_001.fastq.gz
CACTCTGAGT_S8_L001_R1_001.fastq.gz CACTCTGAGT_S8_L001_R2_001.fastq.gz

I have 33 files and I would like to merge the R1 and R2 files with a for loop or another way that is not in bash.

Thanks a lot!

fastq bash illumina • 3.8k views
ADD COMMENT
4
Entering edit mode
2.4 years ago
GenoMax 147k

merging data may be one of the following.

  • Do you want to merge each pair of R1/R2 reads (if they overlap) to generate a longer representation? In this case you can use bbmerge.sh or PEAR or a similar program to merge the reads.
  • Or are you referring to interleaving R1/R2 data so the order of reads becomes R1_1,R2_1,R2_1,R2_2,R3_1,R3_2 etc in single data file. Individual reads are kept intact. In this case you will use reformat.sh -Xmx2g in1=R1.fq in2=R2.fq out=interleaved.fq.

While the second case is not technically merging, it would combine the data into one file.

ADD COMMENT
0
Entering edit mode

for example, I am following the vsearch example pipeline:

Process samples                                                               
for f in *_R1_*.fastq; dor=$(sed -e "s/_R1_/_R2_/" <<< "$f")
s=$(cut -d_ -f1 <<< "$f")

echo
echo ====================================
echo Processing sample $s
echo ====================================

$VSEARCH --fastq_mergepairs $f \
    --threads $THREADS \
    --reverse $r \
    --fastq_minovlen 200 \
    --fastq_maxdiffs 15 \
    --fastqout $s.merged.fastq \
    --fastq_eeout

however, i am new to the for loops so although I understand more or less the steps I cannot make it work with my data because I cannot make the for loop work.. maybe with the vsearch example I can explain better what I want to achieve with merging the pairs

ADD REPLY
0
Entering edit mode

Ah I see. Here VSEARCH is actually doing the read merging as described in point #1. You should be able to use this code as is.

You can try this to see what the loop is actually is doing (assuming VSEARCH can use gzipped files, I am not a VSEARCH user)

for f in *_R1_*.fastq.gz 

do
r=$(sed -e "s/_R1_/_R2_/" <<< "$f")
s=$(cut -d_ -f1 <<< "$f")

echo
echo ====================================
echo Processing sample $s
echo ====================================

echo $VSEARCH --fastq_mergepairs $f \
    --threads $THREADS \
    --reverse $r \
    --fastq_minovlen 200 \
    --fastq_maxdiffs 15 \
    --fastqout $s.merged.fastq.gz \
    --fastq_eeout

done
ADD REPLY
0
Entering edit mode

great thanks! at the end of the for loop should there be a done?

because I am running the script as it is i the terminal but it gets stuck.. Thanks again!

ADD REPLY
0
Entering edit mode

Use ctrl+c to kill the job. I simply copied what you had above. There was a done missing (and one more mistake). I corrected those. See if the new version above is better. This will just print all the command lines to screen.

If all looks good then remove the word echo before $VESEARCH to actually run the analysis.

ADD REPLY
0
Entering edit mode

great thanks a lot for your help!

ADD REPLY
0
Entering edit mode
2.4 years ago
Ram 44k
cat *R1*.fastq.gz > merged.R1.fastq.gz
cat *R2*.fastq.gz > merged.R2.fastq.gz

is super simple. Why "not bash" and "not loop" (although loop is not required here)?

ADD COMMENT
0
Entering edit mode

Thanks for the answer. I would need a for loop because I have multiple files and I was wondering if a part from bash maybe there is another way. However, if I use cat am I merging the R1 and R2 pairs or just the R1 files and R2 files? Now I have R1 and R2 but I would like to have just one file with these two merged.

Thanks again!

ADD REPLY
0
Entering edit mode

I would like to have just one file with these two merged.

Merging R1/R2 files with cat would simply append the contents of R2 file at end of R1 file. Are you sure that is what you want to do? No program can use data in this format.

ADD REPLY
0
Entering edit mode

Thanks! No I don't want to append, I would like to merge the R1 and R2 reads in the files to be processed later.

ADD REPLY

Login before adding your answer.

Traffic: 1064 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6