Dear all,
I’m working with a list of SNP variants (1200 SNP) related to a complex disease, which distributed on the various chromosomes. I used vcftools to compute Fst between my population and 1000 genome populations. Now, to investigate the meaningful results from Fst analysis, I would like to extract random SNP from 1000 genome population and my population that their allele frequencies are similar with the allele frequency of my SNP list. I found the simple of command of shuf –n 1200 file.vcf
1) But I don’t know how to consider the matched (similar) allele frequency with my SNPs, could you please share me your suggestion?
2) Considering the number of my SNP, 1200 across 22 chromosomes, and separate vcf file for each chromosome of 1000 genome, how many SNPs should be randomly extracted from each vcf file?
3) Here, my focus is on Fst analysis, could you please kindly tell me if just considering of matched allele frequency with my SNPs is sufficient for selecting the random SNP or other things should be also considered?
Thanks in advance
Would some sort of outlier approach where your list of variants differ from the global distribution of Fst values make sense here ? There seems to be a nice thread on biostars here Calculating statistically significant outlier for Pairwise Fst obtained from VCFTools .
Your mean is calculating pFst?
however, I see working with random SNP in various paper, so there should be a solution for getting the desired random SNP