I have a large WGS dataset composed of ~350 individuals from 6 different populations and I want to figure out the fixed differences between each population (and the rest) for the sake of calculating the direction of selection: DoS = Dn/(Dn + Ds) - Pn/(Pn + Ps).
I can easily calculate the within-population polymorphic differences (Pn/Ps) by splitting the VCF based on population and running each through SnpEff separately, but it is the Dn/Ds ration I'm having trouble figuring out (fixed differences between a given population and the rest). Any ideas?
I have tried this (and thought it worked), but looking more closely at my output file it doesn't retain SNPs specific to the population (it retains all SNPs for the population, regardless if they appear in others). The script I am using is:
bcftools view -x all_no_outgroups.recode.vcf.gz --samples-file cluster_1.txt > cluster_1_private.vcf
Am I doing something wrong in terms of my script here? As many of the SNPs retained in the cluster_1_private.vcf are also in the other populations.
Nevermind, I was looking at the wrong VCF! It did work, cheers and thanks!