Hello,
I have a VCF file with SNP data (whole genome sequencing) from 47 individuals belonging to 5 different populations. I would like to calculate the average F coefficient, expected and observed heterozygosity for each population. I used the --het argument in VCFtools and tried two different approaches.
Firstly, I used the VCF file with all samples for the calculation. In the second approach, I kept only the individuals of each population and ran the command for each population separately. However, I obtained different values for F, O(HOM), E(HOM), N_SITES, and F between the two approaches. Below, you will find an image of the output file for each approach. I also tried PLINK and got similar results.
vcftools --vcf SNPs_all_samples.vcf –het --out F_all_samples
vcftools --vcf SNPs_all_samples.vcf --keep pop1.txt –-het --out F_pop1
Why did I get different results? Which approach is correct?
Thank you in advance!