I used Stacks::populations to generate summary statistics for my study. I sampled ~250 plants from ~100 sampling sites, and between 1-5 individuals per sampling site. DNA was sequenced with genotype-by-sequencing, a reduced representation method giving ~1% of the genome and about 2000 SNPs at ~400x coverage. I'm interested in understanding how sampling one individual per sampling site impacts these summary statistics.
One question I have is how pi is calculated for those sites. Here is a snipped from the populations.sumstats.tsv file, which shows "Summary statistics for each population" (for me, Stacks' "population" = sampling site). I'll refer to sampling sites as populations going forward.
here is a link to the table key: https://catchenlab.life.illinois.edu/stacks/manual/#pfiles
My understanding is that each row represents a single SNP from a single population. For EACH population, there should be only one row. Rows with only one individual per population should have a pi = 0, because there's only one individual with which it can compare the same SNP. Populations with at least 2 individuals, however, can have pi > 0 because it's possible that SNP could be polymorphic.
Yet sometimes there are rows where populations with one individual have pi > 0. How can this be?
Thanks!