Entering edit mode
7.3 years ago
Inquisitive8995
▴
280
Hi , How can i determine Ancestry Informative Markers for a multiple population (8 distinct populations) dataset? I want to calculate Fst for these populations. Can someone kindly help me with the steps to be followed to determine the Fst? Is there any online tool to do so ? Help would be appreciated :)
You can always go and search google for AIMs, but that will give you a set of 20 or 200 markers, which are good for broad continental distinctions.
A better approach could be to genotype against some big set of good markers, for example Human Origins set (about 600k markers) or markers derived from ancient DNA (1.2kk markers). These datasets can be downloaded from: https://reich.hms.harvard.edu/datasets
It's ease to convert these plink files to bed/vcf and annotate/genotype your samples with them.
I found it easiest to convert any vcf files or whatever into plink files and use then plink for any population genetics metrics derivation. You can also try using eigensoft for PCA plotting or projection of you samples if they are not blessed with many SNPs matching the datasets you are genotyping with
Hey, Thanks for your reply ! Is there any specific software that you use to genotype the samples ? I have my files in Plink format too.
not really. If you have everything set up correctly (the genome versions are the same - thus, the positions of markers correspond between reference samples and your samples) you can even use plink to select only the markers that you want (by these I mean the markers from reference file). This is also a case of genotyping. Simple python scripts are also fully sufficient if anything unexpected is to be done
Thank you so much for the reply. It helps a lot :)