I have a directory of paired-end fastq, some of which are the same individuals but extracted from different DNA samples, and I would like to concatenate these pairs. It would honestly be faster for me to do this manually pair by pair but I am trying to improve my bash/shell scripting skills and I'll need to be able to do this in the future.
The files follow this pattern:
Kam_L_39.1.fq
Kam_L_39.2.fq
Kam_L_48.1.fq
Kam_L_48.2.fq
Kam_T_39.1.fq
Kam_T_39.2.fq
Kam_T_48.1.fq
Kam_T_48.2.fq
I want to concatenate the T
and L
files for each sample number (first number after underscore) with its same read number, so for example concatenate Kam_L_39.1.fq
with Kam_T_39.1.fq
and Kam_L_39.2.fq
with Kam_T_39.2.fq
, and the same for sample 48. This directory also contains T
file pairs that do not have a L
pair matching set; I don't need to worry about those.
I think this would require a conditional for loop, something like: if the end of the file name (_sample.read.fq
) has a match, then concatenate the files, and repeat for all files in the directory. If possible, I would like to keep the original L and T files, and name the concatenated files something like Kam_LT_39.1.fq
and Kam_LT_39.2.fq
I did find questions which are similar, but I'm not sure how to account for the extra variability in my name convention. Tried to modify code from other questions with no success. Any help is very much appreciated as I am new to for loops, thank you!
fantastic explanation, thank you so much for your kind help!