1000 Genome genotypes for list of snps
1
1
Entering edit mode
9.6 years ago

I'm trying to download the genotypes from 1000 Genomes for a list of about 3,500 snps for all individuals of European descent (CEU+GBR+FIN+TSI+IBS). I was wondering if there was an easy way to do this for a given list of snps, or if I would have to resort to scripting, downloading the genotypes of each snp and merging them all together, or something similar.

Thanks!

1000 genome tabix genotype SNP vcftools • 3.6k views
ADD COMMENT
0
Entering edit mode

Sorry, I should have added that I know how to do this for individual snps, I was just wondering if there existed a tool that would do it for me en masse.

ADD REPLY
1
Entering edit mode
9.6 years ago

Okay, so I have an idea of what needs to be done: I need to download a vcf for the region corresponding to each snp, I'm saving these all in a directory per chromosome, which I'm then intending to merge together using vcf-merge, and then each chromosome will be concatenated together using vcf-concat to form one file.

The first part is fine, I can download all the regions without a problem, however I get an error whilst merging. For example, lets say I want two snps and use tabix to get them:

./tabix -f -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz 1:145604788-145604789 | perl vcf-subset -c EUR.samples.list | ./bgzip -c > A.vcf.gz
./tabix -f -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz 1:145604791-145604791 | perl vcf-subset -c EUR.samples.list | ./ bgzip -c > B.vcf.gz

So I'm now left with 3 files; A.vcf.gz, B.vcf.gz, and an index file. I want to merge A and B so I run:

perl vcf-merge A.vcf.gz B.vcf.gz > C.vcf

However I get the error

[main] fail to load the index file.
The command "tabix -l A.vcf.gz" exited with an error. Is the file tabix indexed?
 at Vcf.pm line 172.
    Vcf::throw('Vcf4_1=HASH(0x107a048)', 'The command "tabix -l A.vcf.gz" exited with an error. Is the ...') called at Vcf.pm line 2673
    VcfReader::get_chromosomes('Vcf4_1=HASH(0x107a048)') called at vcf-merge line 197
    main::init_cols('HASH(0x10b7ab0)', 'Vcf4_2=HASH(0x1079d78)') called at vcf-merge line 279
    main::merge_vcf_files('HASH(0x10b7ab0)') called at vcf-merge line 1

I see that it's missing an index, however http://samtools.sourceforge.net/tabix.shtml says that it creates an index file only when the position is missing from the command.

What am I doing wrong here? I feel like I'm probably missing out a step or something...

Edit: Oops, I think I put this in the wrong section, this should be a comment, not an answer!

ADD COMMENT

Login before adding your answer.

Traffic: 2518 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6