Dear All,
I am confused about one item I encounter. I have samples that were sequenced 3-times on 3-lanes to attain the required depth. I am running a pipeline to check fastq quality, adapter removal and alignment.: My question is it better to run fastq and trimmomatic PE on each of these 3- files and then merge them at the BAM stage after alignment OR merging them while in fastq format and then do everything next on the merged files ? how these two differ technically ? For concatenating I am going to use :
cat file1.fastq file2.fastq > mergedfile.fastq
Best regards,
Set aside issues such as memory footprint and file housekeeping, the operations are not commutative. For example, primary and secondery alignments may be mixed up when merging at the BAM level.
If the choice is between merging FASTQs or BAMs, I always go for merging FASTQs. Unless you expect the FASTQs to have batch effects (in which case you should treat the corresponding samples as separate samples all through the pipeline anyway), merging FASTQs is seldom a bad idea given you have sufficient compute resources for aligning the larger FASTQs.