Question

Making sense of genotype calculation equations

0

Entering edit mode

3.3 years ago

from the mountains ▴ 250

I am struggling understand the first two of the following equations, which are from the Supplementary materials of the Conpair paper, and describe how genotypes are calculated: Conpair equations

For the first equation, the probability of D|AA is much lower if all my reads were A compared to if all my reads were B? Is e_j what I think it means? I would think a low e_j means that the call is more reliable. Ex: my error rate is .01 and D={A,A,A,A}:

P(D|AA) = (.01^4)(.99^0)=1e-8

But if D={B,B,B,B}, the calculation comes to:

P(D|AA) = (.01^0)(.99^4) ~ .96

For the second equation, the occurrences of A are not considered at all? The index and upper bound for both operators are exactly the same, if I'm reading that right? Is it just me or are there a bunch of typos here?

Citation: Bergmann EA, Chen BJ, Arora K, Vacic V, Zody MC. Conpair: concordance and contamination estimator for matched tumor-normal pairs. Bioinformatics. 2016 Oct 15;32(20):3196-3198. doi: 10.1093/bioinformatics/btw389. Epub 2016 Jun 26. PMID: 27354699; PMCID: PMC5048070.

modeling concordance conpair • 936 views

ADD COMMENT • link 3.3 years ago by from the mountains ▴ 250

1

Entering edit mode

Are you suggesting reviewers are doing a lousy job? SHOCKING!

More seriously, if you go to Heng's note (p20), yes they got it wrong. They mislabeled AA and BB compared to Heng's 0 and m (plus other typos).

http://lh3lh3.users.sourceforge.net/download/samtools.pdf

ADD REPLY • link 3.3 years ago by Lemire ▴ 940

0

Entering edit mode

I thought only I was allowed to be lousy!

Thanks for the reference, so far the equations make more sense.

ADD REPLY • link 3.3 years ago by from the mountains ▴ 250