Hi,
I have a set of FASTQ files that I want to align to the reference genome. The sequencing for each sample has been done on 2 different runs (flow cells) and 2 different lanes so for each sample I have 4 files. I am not sure when I should merge my files, before or after alignment? I read previous posts that suggest to merge the samples after alignment, but I am not sure what is the best in my case. Could I merge the samples using samtools? Do I just simply cat one at the end of the other?
An example for sample1 is shown below (FC = flow cell, L = Lane)
sample1.FC1.L1
sample1.FC1.L2
sample1.FC2.L1
sample1.FC2.L2
Thanks a lot in advance!
thanks a lot for the reply. I was wondering why it is important to specify RG.
Read groups may be used to indicate which libraries are technical replicates of one another. That will help the variant caller decide how much variability comes from the instrument itself.