I downloaded frequencies of about 10K SNPs from 1000 Genomes. I computed the average frequency for each population ( I believe the standard reported frequency refers to the minor allele). In general, Europeans had the highest frequencies, around 40% but all the other populations had lower frequencies. Can someone explain what this phenomenon is due to? I know that the original population is CEU, so the MAF is defined in reference to Europeans, but I don't see why this would lead it to be systematically higher. I suppose a few SNPs that are European-specific will be absent among other populations but there are not enough to explain this big discrepancy. Here are the frequencies for some populations:
CEU: 0.472 CHB: 0.331 FIN: 0.402 MXL: 0.369 YRI: 0.326
How did you select your 10k SNPs. Are they Europe specific I am confused when you say the original populations is CEU but Europeans have higher frequencies while others have lower
I just randomly selected SNPs using SNPSNAP. They were not European specific. I assumed that the first population sequenced by 1000 Genomes was CEU and that there might be a bias towards European SNPs but I am not sure, just my conjecture.
Please use ADD COMMENT for comments to keep the post organized.
That's not true. Although the sequencing was done in different phases, there was no preferential sequencing for a particular population in any phase. You can get all the information here
http://www.internationalgenome.org/about http://www.bioinf.jku.at/research/sharingShortIBD/hapFabia1000Genomes_html/node34.html
That said, I wii check again if selection of SNPs from snpsnap is bringing some bias