Hi,
I am interested in computing an FST measure for the whole genome. I am implementing the FST Reynolds formula (1983). I found this paper on Genetics with a formula for a per site as well as a per region FST measure:
https://s12.postimg.org/otjso4lct/fst.png
Where a stands for the between genetic differentiation and b for the within genetic differentiation. The formula is easy to apply to a region, you just sum these values for all the sites within your region.
My questions is, if you would like to estimate a per-genome estimate, is it OK to just use this second formula using all the sites in your genome?
Also, in several programs like PLINK you can get a weighted or unweighted estimate of FST. What is the difference between these two? I assume the weighted estimate would be similar to the second formula I am showing? whereas the unadjusted is just the mean of all sites?
Paper: http://www.genetics.org/content/genetics/early/2013/08/15/genetics.113.154740.full.pdf
Thanks for the reply! I will check the method in VCFLIB. Since you personally have implemented that FST estimation, I was wondering what to do with sites that are fixed between two populations. For the FST of Reynolds I was getting undefined values, but I assume it would be sensible to treat those as zero? Would you agree?
You can only calculate FST for segregating sites.