Question

1000 Genome genotypes for list of snps

1

Entering edit mode

10.0 years ago

Alexander Skates ▴ 370

I'm trying to download the genotypes from 1000 Genomes for a list of about 3,500 snps for all individuals of European descent (CEU+GBR+FIN+TSI+IBS). I was wondering if there was an easy way to do this for a given list of snps, or if I would have to resort to scripting, downloading the genotypes of each snp and merging them all together, or something similar.

Thanks!

1000 genome tabix genotype SNP vcftools • 3.7k views

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 10.0 years ago by Alexander Skates ▴ 370

0

Entering edit mode

Genotypes:

http://www.1000genomes.org/faq/can-i-get-genotypes-specific-individualpopulation-your-vcf-files

convert to plink ped/map format:

http://www.1000genomes.org/faq/can-i-convert-vcf-files-plinkped-format

and haplotypes:

http://www.1000genomes.org/faq/can-i-get-haplotype-data-1000-genomes-individuals

ADD REPLY • link 10.0 years ago by Jimbou ▴ 960

0

Entering edit mode

Sorry, I should have added that I know how to do this for individual snps, I was just wondering if there existed a tool that would do it for me en masse.

ADD REPLY • link 10.0 years ago by Alexander Skates ▴ 370

Ram · Answer 1 · 2015-04-10

Okay, so I have an idea of what needs to be done: I need to download a vcf for the region corresponding to each snp, I'm saving these all in a directory per chromosome, which I'm then intending to merge together using vcf-merge, and then each chromosome will be concatenated together using vcf-concat to form one file.

The first part is fine, I can download all the regions without a problem, however I get an error whilst merging. For example, lets say I want two snps and use tabix to get them:

./tabix -f -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz 1:145604788-145604789 | perl vcf-subset -c EUR.samples.list | ./bgzip -c > A.vcf.gz
./tabix -f -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz 1:145604791-145604791 | perl vcf-subset -c EUR.samples.list | ./ bgzip -c > B.vcf.gz

So I'm now left with 3 files; A.vcf.gz, B.vcf.gz, and an index file. I want to merge A and B so I run:

perl vcf-merge A.vcf.gz B.vcf.gz > C.vcf

However I get the error

[main] fail to load the index file.
The command "tabix -l A.vcf.gz" exited with an error. Is the file tabix indexed?
 at Vcf.pm line 172.
    Vcf::throw('Vcf4_1=HASH(0x107a048)', 'The command "tabix -l A.vcf.gz" exited with an error. Is the ...') called at Vcf.pm line 2673
    VcfReader::get_chromosomes('Vcf4_1=HASH(0x107a048)') called at vcf-merge line 197
    main::init_cols('HASH(0x10b7ab0)', 'Vcf4_2=HASH(0x1079d78)') called at vcf-merge line 279
    main::merge_vcf_files('HASH(0x10b7ab0)') called at vcf-merge line 1

I see that it's missing an index, however http://samtools.sourceforge.net/tabix.shtml says that it creates an index file only when the position is missing from the command.

What am I doing wrong here? I feel like I'm probably missing out a step or something...

Edit: Oops, I think I put this in the wrong section, this should be a comment, not an answer!