I am trying to decide if I should merge two sets of fastq files from the same individuals. The sets are:
- Run 1: original sequencing with many short reads after trimming
- Run 2: re-do of the sequencing with a manual library preparation
To make use of the extra data I've got (coverage and quality looks good in Run1), I'm planning to zcat
the fastq files between runs. I tested this on 5 samples by mapping run1 and run2 independently with bwa mem
, and then once again merging fastqs before mapping. However, I've noticed that the number of total reads in the flagstats don't add up: total reads run 1 + total reads run2 ≠ total reads merged.
Do you know what might be causing this?
Read counts:
Technical details:
fastq from Illumina NovaSeq. Reads trimmed at sequencing centre. Mapped via BWA MEM
v.0.7.17 - default parameters.
Probably secondary/supplementary alignments. If you filter for primary alignments the read numbers should add up.
Also
bwa mem
is not deterministic if the read order changes. Shuffle your original FASTQs for one run and you might also observe similar changes.GenoMax, I think you spotted the problem. The secondary alignments row has the same difference as the total count. Though I still don't understand why secondary alignment counts should differ between the merged bam and the sum of parts?