I know there are a few populations genetics/molecular evolution types in the Biostar community, so hopefully someone has already run into this problem.
I've calculated Tajima's D statistic for a couple of DNA alignments using the R package PEGAS, which gives a p-value using a Beta distribution. It happens that the Beta distribution doesn't perfectly model the distribution of D under neutral evolution, so many authors simulate data under neutrality to make a null distribution. Unfortunately, all the methods for these papers tend to say about this procedure is "we used ms
to simulate the null...".
Does anyone know how to do this properly. My understanding was you simulated data with a matching number of segregating sites, so it you had 100 samples and 20 segregating sites you'd do
ms 100 1000 -s 20
giving 1000 simulations. However when I calculate D from those simulations I get a skewed distribution with a mean near to 1 (I would have though it would b 0) but a p-value pretty close to the one using the Beta distribution. It's entirely possible I messed up somewhere in the calculations, (you can see my function here) but I thought I'd make sure I'm actually simulating the distribution correctly before I pick that apart.
Yeah! Love population genetics questions!