Stop GATK MergeBamAlignment from unmapping reads
0
0
Entering edit mode
6.0 years ago
multimeric ▴ 30

I'm using GATK 4.0.11.0-0's MergeBamAlignment, which is supposed to be able to combine an aligned BAM and an unaligned/unmapped BAM without losing any data or adding any duplicates.

I've been invoking it like this:

gatk MergeBamAlignment \
--ALIGNED_BAM aligned.bam \
--UNMAPPED_BAM unmapped.bam \
--OUTPUT merged.bam \
--REFERENCE_SEQUENCE reference.fasta \
--PAIRED_RUN=true

However, if I compare the output of samtools flagstat for the aligned BAM and the merged BAM, I notice that some reads are now unaligned:

Aligned BAM:

51963309 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
51711491 + 0 mapped (99.52% : N/A)
51963309 + 0 paired in sequencing
25872862 + 0 read1
26090447 + 0 read2
51324048 + 0 properly paired (98.77% : N/A)
51663088 + 0 with itself and mate mapped
48403 + 0 singletons (0.09% : N/A)
109625 + 0 with mate mapped to a different chr
98192 + 0 with mate mapped to a different chr (mapQ>=5)

Merged BAM:

95340924 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
51706794 + 0 mapped (54.23% : N/A)
95340924 + 0 paired in sequencing
47670462 + 0 read1
47670462 + 0 read2
48734352 + 0 properly paired (51.12% : N/A)
48815548 + 0 with itself and mate mapped
2891246 + 0 singletons (3.03% : N/A)
70356 + 0 with mate mapped to a different chr
63633 + 0 with mate mapped to a different chr (mapQ>=5)

Notice how the 51,711,491 mapped reads has decreased to 51,706,794. Why has this happened? Am I using MergeBamAlignment wrong? Is it a bug in the tool? I definitely don't want to lose any information from my original alignment.

gatk alignment gatk4 MergeBamAlignment bam • 2.4k views
ADD COMMENT
0
Entering edit mode

Hello ttmigueltt ,

even after reading the manual of MergeBamAlignment I still don't understand for what is it for. Could you please explain to me why you like to use it?

Thanks.

fin swimmer

ADD REPLY
0
Entering edit mode

I think it's for two main purposes:

  • When you are using the BAM format to store metadata about your unmapped reads, you sometimes have to convert your unmapped BAM back to fastq format in order to actually align the reads. For example, BWA doesn't yet accept an unmapped BAM as input. Thus, using this tool you can have a workflow that goes Sequencer → unmapped BAM → fastqs → aligned BAM → (GATK MergeBamAlignment using aligned BAM and unmapped BAM) → merged BAM, which lets you retain all the read metadata but also use these tools that require fastqs.

  • When you have analysis pipeline that strips out some of the reads, for example unaligned reads or secondary reads, and you want to have one mega BAM that contains the final alignment + all the original unaligned reads, you convert those fastqs to unmapped BAM, and then use MergeBamAlignment. The mega BAM can then be used for easy archival because you no longer need to store the fastqs, and you can even use CRAM format for really high compression.

ADD REPLY

Login before adding your answer.

Traffic: 2226 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6