Hi biostars!
I have a question about how is the 2nd best hit found in bwa-mem for calculating the mapping quality. From the bwa paper, it is stated that the mapping quality scores are calculated using a similar algorithm as in the MAQ aligner. In the MAQ paper one can found the following formula for the calculation of mapping quality scores:
Copying and pasting from the paper:
"q1 is the sum of quality values of mismatches of the best hit, q2 is the corresponding sum for the second best hit, n2 is the number of hits having the same number of mismatches as the second best hit, k is the minimum number of mismatches in the 28-bp seed, q is the average base quality in the 28-bp seed, 4.343 is 10/log10, and p1(k,28) is the probability that a perfect hit and a k-mismatch hit coexists given a 28-bp sequence that can be estimated during alignment."
My question comes situations as in the following example:
Lets imagine that we have a illumina sequenced read (100bp), that has a perfect alignment to the position chr1:100000-100100. Then, the second best alignment will be in a lot of cases those alignment positions +1 (chr1:100001-100101.). Which in the top of my head does not makes sense to consider for the 2nd hit
Other implementations of the Farrar Smith-Waterman algorithm find the 2nd best hit by masking the 1st hit. Does anybody know how this is handled in bwa-mem?
Did you just ... tag Heng Li? Could have just mentioned him, you know? lh3
ops, that was my idea, thanks :)