Entering edit mode
5 weeks ago
hashim.rana11
▴
40
I have a sample of a species, with data from three different locations. I have calculated its nucleotide diversity and Tajima's D, but when I calculate FST (estimating population distance with FST), it shows -nan.
Blockquote
vcftools --vcf all_samples.vcf --weir-fst-pop Population_1_ethiopia_names --weir-fst-pop Population_2_ethiopia_names --out pop1_vs_2_FST
Blockquote
Is there any solution for this? Has anyone calculated it in this way? Please guide me about this. Thanks in advance and I am waiting your answers. Best regards,
This should work. I would check if the names in the population files are correct and are actually the correct sample names present in the vcf file. Try
bcftools query -l
to get the sample names and compare. Another issue I would check is if there are any variable positions at all between the groups (though that should give 0) and that the ploidy is <= 2.My research focuses on the Orthoptera species Schistocerca gregaria, which I collected from three locations: Kunming, Xi'an, and Yan'an. I am currently estimating the population distances using FST. In my VCF file, the samples are labeled as S_gregaria_Kunming, S_gregaria_Xi'an, and S_gregaria_Yan'an. I merged the data using the bcftools command
For my specific dataset, I adapted the command as follows: vcftools --vcf all_samples.vcf --weir-fst-pop group_kunming.txt --weir-fst-pop group_xian.txt --weir-fst-pop group_yanan.txt --out pop1_vs_2_vs_3_FST Here, the text files (group_kunming.txt, group_xian.txt, and group_yanan.txt) list the corresponding sample names as they appear in the VCF file. Despite following these steps, I am encountering -nan values in the output.
Could you kindly provide guidance or suggest potential solutions to resolve this issue? Your help would mean a great deal to me.
Thank you in advance for your time and support.