BWA not reporting all similar alignments
1
0
Entering edit mode
5.1 years ago
Chadi Saad ▴ 110

I'm trying to align reads that belong to high similarity regions (pseudogenes)

I'm aligning reads with BWA (last version) using tha -a option.

When observing the BAM file, I notice that the reads are aligned with MAPQ0. Which is normal.

The problem is, every read is present once in the BAM file, i don't find all alignments (even with -a option).

I tried to extract one of these reads with MAPQ 0, and create a FASTQ (r1 & r2) file with only 1 read inside, and then align this read with BWA (using exactly same command as for the complet fq file), and I obtain this time a BAM file with all the possible alignments !!

How is it possible ?

thanks

alignment next-gen bwa bam • 2.9k views
ADD COMMENT
0
Entering edit mode

Could you please show the command you used and the corresponding output?

ADD REPLY
0
Entering edit mode
bwa mem    -aM     -t 8    -v 3    -R '@RG\tID:group1\tSM:SAMPLE\tPL:Illumina\tLB:Panel\tPU:TEST'         ucsc.hg19.fasta       test_R1.trimmed.fq.gz test_R2.trimmed.fq.gz  > test.sam
ADD REPLY
0
Entering edit mode

when aligning only one read:

M01878:748:000000000-BVLWH:1:2102:10051:6653    81      chr7    6013079 0       151M    =       6013017 -213    CTTCCATGGGGACAGTTCCAGGGGTGGTCCATCTCCCCCCTGTGGGTGATCAGTTTCTTCATCTCGCTTGTGTTAAGAGCAGTCCCAATCATCACCTGAGTGTGAGACACAATGGTTCAACGTTTTAGTAGTTTTTTGACGTCAGAATGGC GGGGGGGGGGGFGGGGGGGFGGGGHHHHGHHGGGGGGGHHGGGHHHHHHHHH
M01878:748:000000000-BVLWH:1:2102:10051:6653    321     chr7    6790824 0       151M    =       6013017 -777808 *       *       NM:i:1  MD:Z:111T39     MC:Z:151M       AS:i:146        RG:Z:group1
M01878:748:000000000-BVLWH:1:2102:10051:6653    161     chr7    6013017 0       151M    =       6013079 213     ACAGTGACTACGGTCAGTTCTGAGAAATGACACCCAGGTTGGCGATGTGTCTCATGGTTGGCCTTCCATGGGGACAGTTCCAGGGGTGGTCCATCTCCCCCCTGTGGGTGATCAGTTTCTTCATCTCGCTTGTGTTAAGAGCAGTCCCAAT 3ABBBFFFFFFBGGGGGGGGGGHHHHHHHHHHHGGGHGFHHGHGEEGHHHHH
M01878:748:000000000-BVLWH:1:2102:10051:6653    433     chr7    6790886 0       151M    =       6013079 -777808 *       *       NM:i:1  MD:Z:49T101     MC:Z:151M       AS:i:146        RG:Z:group1

When Algning the whole FQ file:

M01878:748:000000000-BVLWH:1:2102:10051:6653    83      chr7    6013079 0       151M    =       6013017 -213    CTTCCATGGGGACAGTTCCAGGGGTGGTCCATCTCCCCCCTGTGGGTGATCAGTTTCTTCATCTCGCTTGTGTTAAGAGCAGTCCCAATCATCACCTGAGTGTGAGACACAATGGTTCAACGTTTTAGTAGTTTTTTGACGTCAGAATGGC GGGGGGGGGGGFGGGGGGGFGGGGHHHHGHHGGGGGGGHHGGGHHHHHHHHH
M01878:748:000000000-BVLWH:1:2102:10051:6653    163     chr7    6013017 0       151M    =       6013079 213     ACAGTGACTACGGTCAGTTCTGAGAAATGACACCCAGGTTGGCGATGTGTCTCATGGTTGGCCTTCCATGGGGACAGTTCCAGGGGTGGTCCATCTCCCCCCTGTGGGTGATCAGTTTCTTCATCTCGCTTGTGTTAAGAGCAGTCCCAAT 3ABBBFFFFFFBGGGGGGGGGGHHHHHHHHHHHGGGHGFHHGHGEEGHHHHH
ADD REPLY
0
Entering edit mode

The length of the quality string is shorter that the sequence, I think part of the sam record is being left out - part that could help explain this behaviour, as it could contains the optional tags describing additional features from these reads.

ADD REPLY
3
Entering edit mode
5.1 years ago
Jianyu ▴ 580

I think the reason has shown in the flags. When aligning only one pair of reads (let's say R1 and R2), bwa didn't think your R1 and R2 are in proper pair, so the -a option (which will output all alignments for unpaired paired-end reads) led to the 4 alignments you see

But when aligning the whole FQ file, bwa marked R1 and R2 as proper pair, so the -a option won't report other alignments since it thought there was no need to report other alignments.

What made this difference I think is due to the criteria used by bwa on proper pair reads, it requires R1 and R2 have the right orientation and the correct insert size, but threshold for insert is determined by the insert size distribution from all data, that's why bwa mem has different behavior in your situation.

BWA MEM manual: http://bio-bwa.sourceforge.net/bwa.shtml Similar problem: Bwa-Mem And Sam-Flag "Read Paired In Proper Pair"

ADD COMMENT
0
Entering edit mode

Thanks for this explanation !

Then why the read have MAPQ 0 when aligning the whole FQ, while in the BAM file I have an uniq alignement ?

ADD REPLY
0
Entering edit mode

Though bwa mem marked them as proper pair, the MAPQ still shows they are not unique map reads. In both situations, R1 and R2 are non-unique mapped reads. It seems bwa mem chose one alignment per read to be a proper pair, but I don't know how this selection is done.

ADD REPLY
0
Entering edit mode

So to resume, BWA gave a MAPQ 0 to reads because they map to multiple regions (without taking into account the pair information). But In the BAM file, I found only one occurrence per read-pair because the others combinations are not Proper Pairs ? (so -a would be useful only in case of multiple proper pairs ?)

How can avoid I remove correct these MAPQ 0 in order to use these reads for variant calling later ?

ADD REPLY
0
Entering edit mode

Your understanding of MAPQ 0 is right.

Maybe I didn't explain clearly, the default behavior of bwa is to report one random position for each read if having multiple alignments, so you will have one occurrence per read-pair when aligning the whole FQ or single read pair without -a option.

What makes difference is, when aligning single read pair, bwa doesn't think this read pair is a proper pair, so the -a option lets it report the other two alignments it found. But when aligning the whole FQ, -a won't let bwa report other alignments because it thought this read pair is a proper pair

-a option: Output all found alignments for single-end or unpaired paired-end reads.

I didn't get the meaning of your last question, can you be more specific?

ADD REPLY
0
Entering edit mode

So their are is no way to get multiple Proper Pairs (even with -a) ? BWA will take a random choice ?

My second question: since the MAPQ 0 will impact the variant calling process, can I replace the MAPQ0 by the true MAPQ ?

ADD REPLY
0
Entering edit mode

As far as I know, no :(

I think you can give bowtie2 a try if you need a meaningful MAPQ

ADD REPLY
0
Entering edit mode

i will try Bowtie2. Thanks alot !

ADD REPLY

Login before adding your answer.

Traffic: 1859 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6