Bowtie2 - How can MAPQ be used to obtain uniquely mapped reads?
0
0
Entering edit mode
6.8 years ago
ej ▴ 70

Hi,

I am interested in extracting uniquely mapped reads from a bam file by filtering based on the MAPQ field.

I ran some tests to figure out what MAPQ threshold should be used but am having trouble understanding the results.

I ran bowtie2 on paired-end data and saw that when the reads are unique and map perfectly to the genome (no mismatches), the pair gets a MAPQ of 42.

When one mismatch is introduced in each read of the pair, the MAPQ drops to 24.

When both reads of the pair align to a repetitive region of the genome (no mismatches), the MAPQ is 32.

Why would the MAPQ of a pair that aligns to many regions in the genome (which I would like to filter out because it is not uniquely mapped) be higher than a pair that contains two mismatches (one in each read) but maps uniquely?

How should a threshold for uniquely mapped reads be determined?

Any help would be greatly appreciated.

sequencing ChIP-Seq alignment • 5.0k views
ADD COMMENT
0
Entering edit mode

Not a direct answer but 'unique' alignment this can be done with bowtie version 1 with --best and -m 1. This feature was then disabled in bowtie2.

In bowtie2, you can 'virtually' assure unique alignment. please read previous threads, like this:

ADD REPLY
0
Entering edit mode

Thank you so much for your help. Do you know of an example when a read has multiple alignments but will not have the XS tag?

ADD REPLY
0
Entering edit mode

Also, from what I understand, AS is the alignment score of the read and XS is the score of the second best alignment. I used bowtie2 with default parameters so it is supposed to always choose the alignment with the best score, but I have some reads where the AS is lower than the XS. Why would this be?

ADD REPLY
0
Entering edit mode

A quick scan across the forums reveals that the interpretation of AS and XS provokes a lot of questions on various forums.

Yes, AS is the alignment score for the read. After Bowtie2 is comfortable that it has found the best alignment, it sets AS and then continues to look to see where else in the reference the read may align. That's where XS comes into play...

If Bowtie2 finds another alignment location, a secondary alignment, it calculates and sets XS; thus, XS is the alignment score for the read other then the primary reported alignment.

Naturally, one would assume that XS must always be lower than AS, but it is not the case when considering paired alignments. In paired alignments, it is possible, for example, for read1 to be a secondary alignment, whilst read2 a primary alignment. If this concordantly-aligned pair reflects the best alignment for these mate-pairs / paired end reads, then that is when you will see a read having XS greater than AS.

Given how a large chunk of the genome is made up of repetitive sequence and that >50% of genes have known processed or unprocessed pseudogenes, I imagine that this scenario is more common than people believe. Cannot confirm that, though.

ADD REPLY
1
Entering edit mode

That makes sense, thank you so much I really appreciate your help!

ADD REPLY

Login before adding your answer.

Traffic: 1381 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6