Hi there,
I've been generating lots of Stacks::populations outputs with varying -R and -min_maf values (see definitions below) in an effort to understand how changing these variables changes the site frequency spectrum. I'm hoping that understanding how many rare variants are being removed from each scenario will help me decide which combination of -R and -min_maf values to choose for a demographic analysis, which necessitates the inclusion of rare (but real) variants.
-R: minimum percentage of individuals across populations required to process a locus
-min_maf: a minimum minor allele frequency required to process a nucleotide site at a locus (0 < min_maf < 0.5; applied to the metapopulation)
I've generated 60 plots for each of 5 populations using the vcfs and vcf2sfs R package. The min_maf values range from ~0.01 to 0.05 and -R from 0.1 to 1. Generally, when min_maf is ~0.01-0.04, the plots look as expected: the number of variants decreases as -R increases. But when min_maf = 0.05, I see an unexpectedly sharp decrease in variants at R = 0.3. The adjacent plots look fine, though. This happens for each population.
What might've happened here? Any ideas would be appreciated. Thanks!