Hello,
I have 96 *fastqc.gz raw read files from 24 samples. Each sample was sequenced on two lanees for each pair.
I would like to merge reads for each pair from both lanes into one output file with same name identifier from sample file name (2271_merged_R1_001.fastq.gz).
File names are in this order:
22[71-94]*R[1-2]_001.fastq.gz;
**2271**_ID890_1_S1_L001_**R1_001.fastq.gz**
**2271**_ID890_1_S1_L002_**R1_001.fastq.gz**
**2271**_ID890_1_S1_L001_**R2_001.fastq.gz**
**2271**_ID890_1_S1_L002_**R2_001.fastq.gz**
I tried the following short script but only two output files are being generated (first and the last).
FOR R1 files
for rf in 22[71-94]*R1_001.fastq.gz; do zcat $rf > 22"${71-94}"_merged_R1_001.fastq.gz ; done
FOR R2 files
for rf in 22[71-94]*R2_001.fastq.gz; do zcat $rf > 22"${71-94}"_merged_R2_001.fastq.gz ; done
My Questions are: 1. Why only two output files are generated? 2. The number of reads in the out put files are not the sum of the merged files from both lanes. 3. Is there a nice way, I could do the merging of reads from both lanes for both (R1 and R2) in single step instead of running it two times for each read type.
What went wrong in the code? and how could I verify that the output files are completely merged?
Thanks
For 48 files for R1, following code will work ( Take a back up of your work and try on 1-2 sets before using. Match MD5sums):
Works for R2 as well. Output file names would be: 2271_merged_R1_001.fastq.gz for 2271 R1.