Can anyone please confirm the following.
You should generate sam/bams with read groups for downstream analysis (GATK)
If you have fastqs from the same sample run over different lanes of the flowcell these would have different read groups
Therefore you should keep the fastqs seperate, align whilst adding read groups and then merge after alignment.
By concatenating fastqs before the alignment step you would lose the read group information.
I'm aware fastqs don't have read groups and that we can add them at the alignment stage. I was wondering as to whether we need to preserve lane information for downstream applications such as MarkDuplicates i.e. when marking optical duplicates?