How can I download snp genotype file from 1000 Genome?
2
1
Entering edit mode
9.9 years ago
evo_genomics ▴ 60

How can I download genotype of specific snp (snp of coding region) for African population from 1000 Genome?

Thanks

SNP genotype population Genetics • 7.0k views
ADD COMMENT
2
Entering edit mode
9.9 years ago
wangyi2412 ▴ 240

Visit ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ for the actual data.

There are also sample_population relationship description on the same ftp site. I don't have access to my record of the specific dir right now, but just browsing the site to see the docs will find it without much effort.

Hope this would help.

ADD COMMENT
0
Entering edit mode

And you can check ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/current.tree if you want to find other things.

ADD REPLY
2
Entering edit mode
9.9 years ago
rbagnall ★ 1.8k

You can use tabix if you prefer not to download the large vcf files of the actual data.

To download a single snp, lets say chr6 nucleotide position 7580958 (1 based numbering of GRCh 37 from the 1000 Genomes phase 3 data). Format is: tabix name-of-vcf-file chr:start-end

tabix ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.wgs.phase3_shapeit2_mvncall_integrated_v5.20130502.sites.vcf.gz 6:7580958-7580959
6    7580958    rs2076299    A    G    100    PASS    AC=1018;AF=0.203275;AN=5008;NS=2504;DP=21936;EAS_AF=0.2867;AMR_AF=0.1772;AFR_AF=0.3139;EUR_AF=0.0358;SAS_AF=0.1585;AA=A|||

So the African allele frequency of rs2076299 in the 1000 Genomes data is AFR_AF=0.3139

ADD COMMENT
0
Entering edit mode

Ah, now I see I have shown how to get the allele frequency, when 'genotypes' were asked for. You can still use tabix. You will need to retrieve information for the chromosome-specific vcf files of the 1000 Genomes data, which contain genotypes. (note the ALL.chr.6. bit in the file path. Change this to your chromosome number of choice)

tabix -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr6.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz 6:7580958-7580959

In the above example, I have included -h option, which prints out the vcf header, including the sample IDs (e.g. NA21122 NA21123 NA21124 NA21125, etc). After the header lines is the variant information, including genotypes:

6    7580958    rs2076299    A    G    100    PASS    AC=1018;AF=0.203275;AN=5008;NS=2504;DP=21936;EAS_AF=0.2867;AMR_AF=0.1772;AFR_AF=0.3139;EUR_AF=0.0358;SAS_AF=0.1585;AA=A|||    GT0|0    0|0    0|0    0|0    0|1    0|1    0|0    0|0    0|0    0|0    0|0    0|1    0|0    0|0    0|0    0|0    0|0    0|0    0|0    0|0
...etc

Now you need to know the ethnicity of the sample IDs and you can find that information in this excel file:

http://www.1000genomes.org/sites/1000genomes.org/files/documents/20101214_1000genomes_samples.xls

From this file I can see that samples NA19092 to NA19266 are YRI (Yoruba in Ibadan, Nigeria).

ADD REPLY
0
Entering edit mode

Thank you

ADD REPLY

Login before adding your answer.

Traffic: 1935 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6