Hi all,
I used three populations to call snps by SAMtools.And filtered biallelic snps by vcftools, I need to calculate between populations's fst, the original vcf file was divided into 3 sub-vcf files, each sub-vcf only contains two populations. The variant sites exist in the original vcf for three populations may not exist in the sub-vcf for two populations, because when extract only two populations, some sites should be the same between the included two populations but different from the third population, they are variants in the original vcf, but not variant any more in the sub-vcf. So, my question is how to delete these non-variant sites in the sub-vcf file? These sites genotypes are the same homozygote 0/0 or 1/1.
Is there any scripts or some softwares could address this?
Thanks all!
Sincerely,
Dezhi
why ? asking because all the information is already here. Why do you need to remove some samples ?
Hi Pierre,
If using mutlple populations to call SNPs, but only paired comparison are needed, there may have some sites are not SNPs anymore. And also, using multiple populations would omit biallelic SNPs only exist in the focal two populations, because these kind SNPs may become triallelic or tetra-allelic when other populations are taken into account. But seems like it doesn't matter from plentiful literatures.
This is my personal opinion, I don't know if this is correct, Your comments and criticism are greatly welcomed.