Hello, i'm trying to calculate --fst values on a vcf that comprises the 1000genomes individuals and one sample of my own. i'm trying to do the following analysis:
plink --vcf todos41.vcf --within IDsparahw.txt --fst --make-bed --out todos41fst
But it gives back the following error: Error: --fst requires at least two nonempty clusters.
Any thoughts about what this means? Thanks !
PS: the IDsparahw.txt file has the following format:
head IDsparafreqs.txt
1_HG00096 1_HG00096 EUR
2_HG00097 2_HG00097 EUR
3_HG00099 3_HG00099 EUR
4_HG00100 4_HG00100 EUR
5_HG00101 5_HG00101 EUR
6_HG00102 6_HG00102 EUR
7_HG00103 7_HG00103 EUR
8_HG00105 8_HG00105 EUR
9_HG00106 9_HG00106 EUR
10_HG00107 10_HG00107 EUR
...
(matching the header ID names from the vcf that's being used)
I have to admit that I have not used plink with --fst before. Therefore, this is just a guess from the error message: Is it possible that you did not define a second cluster in IDsparahw.txt file except of EUR ?
I defined several clusters. If you see the file you can see 7, so I don't know what's really the problem :/ . The clusters are: AFR, AMR, EUR, EAS, SAS , IND and AFURU.