How can I download genotype of specific snp (snp of coding region) for African population from 1000 Genome?
Thanks
How can I download genotype of specific snp (snp of coding region) for African population from 1000 Genome?
Thanks
Visit ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ for the actual data.
There are also sample_population relationship description on the same ftp site. I don't have access to my record of the specific dir right now, but just browsing the site to see the docs will find it without much effort.
Hope this would help.
You can use tabix if you prefer not to download the large vcf files of the actual data.
To download a single snp, lets say chr6 nucleotide position 7580958 (1 based numbering of GRCh 37 from the 1000 Genomes phase 3 data). Format is: tabix name-of-vcf-file chr:start-end
tabix ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.wgs.phase3_shapeit2_mvncall_integrated_v5.20130502.sites.vcf.gz 6:7580958-7580959
6 7580958 rs2076299 A G 100 PASS AC=1018;AF=0.203275;AN=5008;NS=2504;DP=21936;EAS_AF=0.2867;AMR_AF=0.1772;AFR_AF=0.3139;EUR_AF=0.0358;SAS_AF=0.1585;AA=A|||
So the African allele frequency of rs2076299 in the 1000 Genomes data is AFR_AF=0.3139
Ah, now I see I have shown how to get the allele frequency, when 'genotypes' were asked for. You can still use tabix. You will need to retrieve information for the chromosome-specific vcf files of the 1000 Genomes data, which contain genotypes. (note the ALL.chr.6. bit in the file path. Change this to your chromosome number of choice)
tabix -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr6.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz 6:7580958-7580959
In the above example, I have included -h option, which prints out the vcf header, including the sample IDs (e.g. NA21122 NA21123 NA21124 NA21125
, etc). After the header lines is the variant information, including genotypes:
6 7580958 rs2076299 A G 100 PASS AC=1018;AF=0.203275;AN=5008;NS=2504;DP=21936;EAS_AF=0.2867;AMR_AF=0.1772;AFR_AF=0.3139;EUR_AF=0.0358;SAS_AF=0.1585;AA=A||| GT0|0 0|0 0|0 0|0 0|1 0|1 0|0 0|0 0|0 0|0 0|0 0|1 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0
...etc
Now you need to know the ethnicity of the sample IDs and you can find that information in this excel file:
http://www.1000genomes.org/sites/1000genomes.org/files/documents/20101214_1000genomes_samples.xls
From this file I can see that samples NA19092 to NA19266 are YRI (Yoruba in Ibadan, Nigeria).
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
And you can check ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/current.tree if you want to find other things.