Hi all,
I am new to selection studies, so I have poor intuition regarding Tajima's D interpretation. I have a dataset with 20,000 SNPs distributed across a haploid genome for ~150 samples. The whole genome is ~50Mb. I have been using the SNP alignment to calculate summary statistics like Tajima's D, rather than the whole genome alignment (because it takes exponentially shorter to calculate stats with 3Mb than 7.5 Gb). Does this affect my Tajima's D calculation? My understanding is that pi and theta only use segregating sites anyway, so as long as the window includes the same SNPs, the total window length shouldn't matter, meaning that a graph of D over the genome will look the same assuming the window sizes are proportional.
I appreciate your input.
Alex
Hi,
As you said Tajima's D is computed thanks to segregating sites, so it's ok to use only SNPs information to obtain it.