Question

Estimating FST per genome

0

Entering edit mode

8.4 years ago

GabrielMontenegro ▴ 680

Hi,

I am interested in computing an FST measure for the whole genome. I am implementing the FST Reynolds formula (1983). I found this paper on Genetics with a formula for a per site as well as a per region FST measure:

https://s12.postimg.org/otjso4lct/fst.png enter image description here

Where a stands for the between genetic differentiation and b for the within genetic differentiation. The formula is easy to apply to a region, you just sum these values for all the sites within your region.

My questions is, if you would like to estimate a per-genome estimate, is it OK to just use this second formula using all the sites in your genome?

Also, in several programs like PLINK you can get a weighted or unweighted estimate of FST. What is the difference between these two? I assume the weighted estimate would be similar to the second formula I am showing? whereas the unadjusted is just the mean of all sites?

Paper: http://www.genetics.org/content/genetics/early/2013/08/15/genetics.113.154740.full.pdf

next-gen genome fst • 3.5k views

ADD COMMENT • link updated 8.4 years ago by Zev.Kronenberg 12k • written 8.4 years ago by GabrielMontenegro ▴ 680

score 0 · Answer 1 · 2016-12-01

0

Entering edit mode

8.4 years ago

Zev.Kronenberg 12k

For a genomic average I would just use Weir and Cockerham's FST (1984) for each site then build a distribution across the genome. You can also just take the average across the site FST values.

I've implemented this method in VCFLIB. If you're interested in learning more about FST I've tried to name all the variables to match the paper.

https://github.com/vcflib/vcflib/blob/master/src/wcFst.cpp

ADD COMMENT • link 8.4 years ago by Zev.Kronenberg 12k

0

Entering edit mode

Thanks for the reply! I will check the method in VCFLIB. Since you personally have implemented that FST estimation, I was wondering what to do with sites that are fixed between two populations. For the FST of Reynolds I was getting undefined values, but I assume it would be sensible to treat those as zero? Would you agree?

ADD REPLY • link 8.4 years ago by GabrielMontenegro ▴ 680

1

Entering edit mode

You can only calculate FST for segregating sites.

    if(populationTarget->af == -1 || populationBackground->af == -1){
  delete populationTarget;
  delete populationBackground;
  continue;
}
if(populationTarget->af == 1 &&  populationBackground->af == 1){
  delete populationTarget;
      delete populationBackground;
  continue;
}
if(populationTarget->af == 0 &&  populationBackground->af == 0){
  delete populationTarget;
      delete populationBackground;
  continue;
}

ADD REPLY • link 8.4 years ago by Zev.Kronenberg 12k