I've got a set of 1800 SNPs, which I am studying. I have the allele frequencies in two related populations and the counts for each allele (i.e., I have plink --freq --missing --within cluster --out output). For the most part, the allele frequencies track each other. My null hypothesis is that for each individual SNP the allele frequency is equal. I want to figure out p-values to determine whether to reject the null hypothesis.
I have access to R and to JMP Genomics. Any suggestions on how to do this?
Exactly equal or approximately equal?
Essentially, this sounds like an association analysis in which you want to check if one allele is more common in one population than in another than expected by chance.
Approximately equal
Basically yes, but I want to do that for 1800 individual SNPs.
What about http://pngu.mgh.harvard.edu/~purcell/plink/anal.shtml then?
I am trying to figure out how to get that to work with Plink 1.9, which lets me specify which allele is which. Also, my populations are defined by a cluster file as opposed to base on case/control.
Any other ideas how to do this?
I don't see the problem with population not being case/control, you can just do association of population A vs population B. You might van population stratification issues, but technically it shouldn't be a problem.
But how do I specify to Plink to do it based on clusters instead of case/control?