Discrepancies in F Coefficient and Heterozygosity Estimates
1
0
Entering edit mode
7 months ago
eleftosi • 0

Hello,

I have a VCF file with SNP data (whole genome sequencing) from 47 individuals belonging to 5 different populations. I would like to calculate the average F coefficient, expected and observed heterozygosity for each population. I used the --het argument in VCFtools and tried two different approaches.

Firstly, I used the VCF file with all samples for the calculation. In the second approach, I kept only the individuals of each population and ran the command for each population separately. However, I obtained different values for F, O(HOM), E(HOM), N_SITES, and F between the two approaches. Below, you will find an image of the output file for each approach. I also tried PLINK and got similar results.

vcftools --vcf SNPs_all_samples.vcf –het --out  F_all_samples

enter image description here

vcftools --vcf SNPs_all_samples.vcf --keep pop1.txt –-het --out F_pop1 

enter image description here

Why did I get different results? Which approach is correct?

Thank you in advance!

population-genetics F-coefficient plink vcftools • 415 views
ADD COMMENT
1
Entering edit mode
7 months ago
tothepoint ▴ 940

Both approaches are correct and based on your experimental design it can be narrow down which one align better for your research question.

vcftools --vcf SNPs_all_samples.vcf –het --out F_all_samples 

F_all_samples contain the heterozygosity statistics for each individual in the dataset and the allele frequencies are calculated across the entire dataset

Whereas

vcftools --vcf SNPs_all_samples.vcf --keep pop1.txt –-het --out F_pop1

--keep pop1.txt

option specifies that the analysis should only include sample from the pop1.txt and allele frequencies are calculated separately for each population.

IMO: If your aim is to understand heterozygosity across the entire dataset or compare individuals irrespective of their population, use the first command. If you want to analyze heterozygosity within specific populations and compare these groups use the second command for each population group you have.

For better understanding read the vcftools manual from here: https://vcftools.github.io/man_latest.html

ADD COMMENT

Login before adding your answer.

Traffic: 1708 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6