Are There Population-Specific Snps Included In The 1000Genomes Vcf Files?
2
5
Entering edit mode
10.7 years ago

The Phase 1 of the 1000Genomes data has published genotypes from about 1,092 individuals, and made them available in their FTP server: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/integrated_call_sets/

So far, I was always convinced that these files contain only SNPs that are present in all the populations. For example, I thought that SNPs present only in the European population ("private" to Europeans) would be filtered out from this dataset.

The problem is that I can't find any reference or README file confirming me that the VCF files in 1000 Genomes refer only to 'cosmopolitan' SNPs. Can anyone please point me to a reference or documentation file?

Thanks in advance!

1000genomes vcf snp • 4.4k views
ADD COMMENT
0
Entering edit mode

Hi Giovanni,

My name is Ahmed, I was working on 1000 Genomes data as the base of my Master's project and had published a paper (at Genome Biology and Evolution journal) about population stratification and inference of familial relationships through genomic data. much more data I computed that wasn't published and I still have it was continental specific SNPs (African , Euro, Asian specific SNPs), if u would like to, I can share those results with you.

contact me on my email ahmedc3.ri@gmail.com if you still interested in such data.

Thanks

Ahmed

ADD REPLY
4
Entering edit mode
10.7 years ago

Hi there

There is no filter to remove population specific SNPs, that's why you can't find a reference to it. It is rare to find, for example, SNPs at medium frequency in Europe and absent from all other populations. Singleton SNPs are by definition also population specific, but in the process of trying to reduce FDR, we have lower power to detect these. Figure 3 on the Phase1 paper discusses f2 variants (which occur on 2 chromosomes in the whole 1092 samples), and you can see how often these SNPs are shared between two populations.

Zam

ADD COMMENT
0
Entering edit mode

Thank you very much. I knew that in the 1000G paper they discussed about private variants, but for some reason I was convinced that these were not included in the data released in the FTP. I have probably made confusion with some of the intermediate folders that were in the FTP before the publication of the paper.

ADD REPLY
1
Entering edit mode

No problem! I found it hard to keep up with the data except when I was tracking all the conference calls. Z

ADD REPLY
2
Entering edit mode
10.7 years ago
pd3 ▴ 350

The VCFs contain population allele frequencies, AMR_AF, ASN_AF, AFR_AF, EUR_AF which can be used to filter sites more frequent in one population:

bcftools view -i'EUR_AF > AMR_AF & EUR_AF > ASN_AF' file.vcf.gz

(Link to bcftools.)

ADD COMMENT

Login before adding your answer.

Traffic: 1997 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6