Entering edit mode
9.5 years ago
Earendil
▴
50
Having calculated pairwise Fst values with vcftools, I now need to find the threshold for outlier loci. I've decided to follow the second suggestion from an answer of this thread: Calculating statistically significant outlier for Pairwise Fst obtained from VCFTools which is:
2. Permute your genotypes and re-run Fst many times. This would be considered an empirical p-value, or probability.
Since I am having no statistical background, would there be a simple explanation on how to implement this?
Dear Earendil,
I'd like to know that if you solve the problem. I'm new to NGS analysis and I'm stuck in this problem. Hope for your help!
Dear Shangzhe,
That was long time ago, I didn't find out how to do that so instead I used the software Bayescan which directly detects Fst outliers, you would might want to take a look at that.
Dear Earendil,
Thanks for you reply. Coincidentally one of my colleague used this method to find the outliers of Fst.
On my opinion this method is useful when your windowed-Fst data normally distributed and its mean is almost 1, which is difficult to identify the outliers using the p-value against the normal distribution defined by its mean and standard distribution.
The method is actually called multiple testing. For our data, this method avoid the bias due to the windows with few SNP pairs but with the Fst value relatively high.
Maybe this is ambiguous, I copy the link of the article of my colleague and the article of the method below. Hope these can help:
And thank you again. I will see Bayescan.