Question

Haplotype Frequencies And Maximum Likelihood Estimation / Expectation Maximization Algorithm

0

Entering edit mode

11.7 years ago

tommy.carstensen ▴ 40

Does anyone have a numerical example on how the EM algorithm can be used to determine haplotype frequencies from genotype frequencies? I have searched a lot with Google, and I just can't find a single numerical example out there. Thank you.

haplotype • 6.6k views

ADD COMMENT • link 11.7 years ago by tommy.carstensen ▴ 40

0

Entering edit mode

Is the end goal to associate a phenotype to a haplotype? If so, BEAGLE has auxiliary scripts to do this. Do you have phased data? Can you provide a little more background?

ADD REPLY • link 11.7 years ago by Zev.Kronenberg 12k

0

Entering edit mode

Hi Zev. The end goal is to understand, how PLINK, WDIST and other software packages estimate haplotype frequencies from genotype data using the EM algorithm cf. http://pngu.mgh.harvard.edu/~purcell/plink/ld.shtml

ADD REPLY • link 11.7 years ago by tommy.carstensen ▴ 40

0

Entering edit mode

Can I close this question as nobody seems to know the answer?

ADD REPLY • link 11.7 years ago by tommy.carstensen ▴ 40

0

Entering edit mode

tommy.carstensen http://www.slideshare.net/awais77/measures-of-linkage-disequilibrium-10238151. These slides have an empirical example: slides 14-16. "calculating linkage disequilibrium R squared" - google search.

ADD REPLY • link 11.7 years ago by Zev.Kronenberg 12k

0

Entering edit mode

Thanks. That is an example with haplotypes. I am interested in how the haplotype frequencies are calculated from the genotypes with the EM algorithm.

ADD REPLY • link 11.7 years ago by tommy.carstensen ▴ 40

score 1 · Answer 1 · 2013-08-20

1

Entering edit mode

11.7 years ago

Zev.Kronenberg 12k

tommy.carstensen The expectation is explained on the Linkage Dis-equilibrium wiki page

http://en.wikipedia.org/wiki/Linkage_disequilibrium

Take a close look at the example:

Example: Human Leukocyte Antigen (HLA) alleles

specifically:

and the estimated frequency of haplotype xy is

If this doesn't answer your question I will attempt to explain it in other terms.

ADD COMMENT • link 11.7 years ago by Zev.Kronenberg 12k

0

Entering edit mode

Thanks Zev. I will have a look at it this evening. Thank you.

ADD REPLY • link 11.7 years ago by tommy.carstensen ▴ 40

0

Entering edit mode

Let us say I have 10 samples in total. At SNP1 I have these genotypes: AA CC AC AA CC AC AA CC AC AA

At SNP2: GG GT GG GG GT TT TT GT GT TT

What is then the LD between these two SNPs?

ADD REPLY • link 11.7 years ago by tommy.carstensen ▴ 40

0

Entering edit mode

Which measure of LD do you want? R, R^2, D, D'?

ADD REPLY • link 11.7 years ago by Zev.Kronenberg 12k

0

Entering edit mode

R squared. I can convert if necessary. I am not interested in the answer. That I can get with PLINK or WDIST. I am interested in the calculation. Thank you.

ADD REPLY • link 11.7 years ago by tommy.carstensen ▴ 40

score 1 · Answer 2 · 2013-08-29

1

Entering edit mode

11.7 years ago

tommy.carstensen ▴ 40

I found a paper pointing me in the direction of an answer to my question: "Linkage Disequilibrium Between Loci With Unknown Phase" http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2710162/pdf/GEN1823839.pdf

ADD COMMENT • link 11.7 years ago by tommy.carstensen ▴ 40

0

Entering edit mode

Ah, sorry should have sent that to you! I work with Chad Huff. I implemented his method and it works well.

ADD REPLY • link 11.7 years ago by Zev.Kronenberg 12k

0

Entering edit mode

Thanks for all of your replies. Is your code open source and on github by any chance?

ADD REPLY • link 11.7 years ago by tommy.carstensen ▴ 40