Dear All,
I have a lot of paired-end sequences (R1 and R2) from a NG sequencing. However, each sample was run on a different lane (on a flow cell), thus the fastq files from the same lane must be merged. For example, my files look like:
POP_Sample1_L001_R1.fastq.gz
POP_Sample1_L002_R1.fastq.gz
POP_Sample1_L001_R2.fastq.gz
POP_Sample1_L002_R2.fastq.gz
...
POP_Sample2_L001_R1.fastq.gz
POP_Sample2_L002_R1.fastq.gz
...
All files are stored in a single folder (I have a lot of files). I would like to merge the same files(from the different lanes) into one file (such as the two sample in italics). For this purpose, I would like to write a script with IF condition on their names: (This is just an idea, I'm sure that this script doesn't work. Within IF, I would concatenate the files with cat.)
if [ -f $POP_Sample*_R1.fastq.gz == $POP_Sample*_R1.fastq.gz]
then
cat POP_Sample*_R1.fastq.gz > POP_Sample*_R1_concatenated.fastq.gz
fi
May I kindly ask your help, I'm not sure even if it is possible.
Best, Thend
instead of using
if
, how about using two loopsfor S in sample1 sample2 sample3
andfor R in R1 R2
and usefind
to get the fastqz files to concatenate.bash string manipulation in combination with find will work in a single loop. You do not need loop for this if you use parallel.