Question

How is the 2nd best hit found in bwa-mem

0

Entering edit mode

7.3 years ago

IP ▴ 780

Hi biostars!

I have a question about how is the 2nd best hit found in bwa-mem for calculating the mapping quality. From the bwa paper, it is stated that the mapping quality scores are calculated using a similar algorithm as in the MAQ aligner. In the MAQ paper one can found the following formula for the calculation of mapping quality scores:

mapq formula

Copying and pasting from the paper:

"q1 is the sum of quality values of mismatches of the best hit, q2 is the corresponding sum for the second best hit, n2 is the number of hits having the same number of mismatches as the second best hit, k is the minimum number of mismatches in the 28-bp seed, q is the average base quality in the 28-bp seed, 4.343 is 10/log10, and p1(k,28) is the probability that a perfect hit and a k-mismatch hit coexists given a 28-bp sequence that can be estimated during alignment."

My question comes situations as in the following example:

Lets imagine that we have a illumina sequenced read (100bp), that has a perfect alignment to the position chr1:100000-100100. Then, the second best alignment will be in a lot of cases those alignment positions +1 (chr1:100001-100101.). Which in the top of my head does not makes sense to consider for the 2nd hit

Other implementations of the Farrar Smith-Waterman algorithm find the 2nd best hit by masking the 1st hit. Does anybody know how this is handled in bwa-mem?

bwa-mem • 1.9k views

ADD COMMENT • link updated 7.2 years ago by Biostar 20 • written 7.3 years ago by IP ▴ 780

0

Entering edit mode

Did you just ... tag Heng Li? Could have just mentioned him, you know? lh3

ADD REPLY • link 7.3 years ago by Ram 45k

0

Entering edit mode

ops, that was my idea, thanks :)

ADD REPLY • link 7.3 years ago by IP ▴ 780