Question

Merged BAM has more reads than sum of parts

0

Entering edit mode

20 months ago

James Reeve ▴ 130

I am trying to decide if I should merge two sets of fastq files from the same individuals. The sets are:

Run 1: original sequencing with many short reads after trimming
Run 2: re-do of the sequencing with a manual library preparation

To make use of the extra data I've got (coverage and quality looks good in Run1), I'm planning to zcat the fastq files between runs. I tested this on 5 samples by mapping run1 and run2 independently with bwa mem, and then once again merging fastqs before mapping. However, I've noticed that the number of total reads in the flagstats don't add up: total reads run 1 + total reads run2 ≠ total reads merged.

Do you know what might be causing this?

Read counts:

enter image description here

Technical details:

fastq from Illumina NovaSeq. Reads trimmed at sequencing centre. Mapped via BWA MEM v.0.7.17 - default parameters.

fastq WGS samtools BWA • 755 views

ADD COMMENT • link 20 months ago by James Reeve ▴ 130

2

Entering edit mode

Do you know what might be causing this?

Probably secondary/supplementary alignments. If you filter for primary alignments the read numbers should add up.

ADD REPLY • link 20 months ago by GenoMax 148k

0

Entering edit mode

Also bwa mem is not deterministic if the read order changes. Shuffle your original FASTQs for one run and you might also observe similar changes.

ADD REPLY • link 20 months ago by benformatics 4.1k

0

Entering edit mode

GenoMax, I think you spotted the problem. The secondary alignments row has the same difference as the total count. Though I still don't understand why secondary alignment counts should differ between the merged bam and the sum of parts?

ADD REPLY • link 20 months ago by James Reeve ▴ 130