Thank you Brian for your response.
Actually I have run bayescan for selection detection with default parameters and I found 800 directional SNPs at FDR 0.05
out of 9134 total loci among all pops comparison.
It looks a quite high number, and there is a high missingness rate in the date up to 50% which may cause a bias in the
result.
I thought that I can do either one of the following to improve the quality:
1- exclude the individuals with low genotyping rate and this is not applicable for my data because I have already removed many individuals and I have only now few individuals to include in the analysis.
2- Identify the SNPs with high missingness rate in those individuals and exclude them from the analysis, I think I need to stick with this option. What do you think?
I have no idea about making the hashtable, however with plink there might be something similar, through the option --missing it will give for each snp how many individuals are missing this snp?
Any comment or thought is welcomed:)
Best regards
Why would you even want to do that? If you remove the variations that make the "populations" different, they'd all be the same, so you may as well be studying only 1 population.
If that's what you want to do, then make a hashtable of your variations, mapping variation to count, and only keep the ones that occur in every population or every individual (everything other than the ones with the highest count).