Merged BAM has more reads than sum of parts
0
0
Entering edit mode
19 months ago
James Reeve ▴ 130

I am trying to decide if I should merge two sets of fastq files from the same individuals. The sets are:

  • Run 1: original sequencing with many short reads after trimming
  • Run 2: re-do of the sequencing with a manual library preparation

To make use of the extra data I've got (coverage and quality looks good in Run1), I'm planning to zcat the fastq files between runs. I tested this on 5 samples by mapping run1 and run2 independently with bwa mem, and then once again merging fastqs before mapping. However, I've noticed that the number of total reads in the flagstats don't add up: total reads run 1 + total reads run2 ≠ total reads merged.

Do you know what might be causing this?

Read counts:

enter image description here

Technical details:

fastq from Illumina NovaSeq. Reads trimmed at sequencing centre. Mapped via BWA MEM v.0.7.17 - default parameters.

fastq WGS samtools BWA • 732 views
ADD COMMENT
2
Entering edit mode

Do you know what might be causing this?

Probably secondary/supplementary alignments. If you filter for primary alignments the read numbers should add up.

ADD REPLY
0
Entering edit mode

Also bwa mem is not deterministic if the read order changes. Shuffle your original FASTQs for one run and you might also observe similar changes.

ADD REPLY
0
Entering edit mode

GenoMax, I think you spotted the problem. The secondary alignments row has the same difference as the total count. Though I still don't understand why secondary alignment counts should differ between the merged bam and the sum of parts?

ADD REPLY

Login before adding your answer.

Traffic: 1540 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6