So far, I was always convinced that these files contain only SNPs that are present in all the populations. For example, I thought that SNPs present only in the European population ("private" to Europeans) would be filtered out from this dataset.
The problem is that I can't find any reference or README file confirming me that the VCF files in 1000 Genomes refer only to 'cosmopolitan' SNPs. Can anyone please point me to a reference or documentation file?
My name is Ahmed, I was working on 1000 Genomes data as the base of my Master's project and had published a paper (at Genome Biology and Evolution journal) about population stratification and inference of familial relationships through genomic data. much more data I computed that wasn't published and I still have it was continental specific SNPs (African , Euro, Asian specific SNPs), if u would like to, I can share those results with you.
contact me on my email ahmedc3.ri@gmail.com if you still interested in such data.
There is no filter to remove population specific SNPs, that's why you can't find a reference to it.
It is rare to find, for example, SNPs at medium frequency in Europe and absent from all other populations.
Singleton SNPs are by definition also population specific, but in the process of trying to reduce FDR, we have lower power to detect these.
Figure 3 on the Phase1 paper discusses f2 variants (which occur on 2 chromosomes in the whole 1092 samples),
and you can see how often these SNPs are shared between two populations.
Thank you very much. I knew that in the 1000G paper they discussed about private variants, but for some reason I was convinced that these were not included in the data released in the FTP. I have probably made confusion with some of the intermediate folders that were in the FTP before the publication of the paper.
Hi Giovanni,
My name is Ahmed, I was working on 1000 Genomes data as the base of my Master's project and had published a paper (at Genome Biology and Evolution journal) about population stratification and inference of familial relationships through genomic data. much more data I computed that wasn't published and I still have it was continental specific SNPs (African , Euro, Asian specific SNPs), if u would like to, I can share those results with you.
contact me on my email ahmedc3.ri@gmail.com if you still interested in such data.
Thanks
Ahmed