1000G Query / Using Tabix With A Proxy
2
0
Entering edit mode
11.0 years ago
secretjess ▴ 210

When I run the following command the connection times out:

./tabix -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/ALL.2of4intersection.20100804.genotypes.vcf.gz 1:57000000-57001000 > test.vcf
connect: Connection timed out
[main] fail to open the data file

But if I run this it works (it might take an estimated 4 hours to download but it does connect!):

wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/ALL.2of4intersection.20100804.genotypes.vcf.gz

So how can I query the 1000 genomes data from behind a proxy? I'm assuming that's the problem.

(P.S. What I want to know is if there's any recorded SNPs, SVs, etc in a specified region)

tabix vcftools • 4.8k views
ADD COMMENT
0
Entering edit mode

As of now, tabix does not support ftp proxy.

ADD REPLY
2
Entering edit mode
11.0 years ago
Ying W ★ 4.3k

First off, I really don't think this is the right way of doing things, you should run tabix after downloading the complete file especially since the download might break halfway through and then you will have to rerun everything. That said, achieve what you initially set out to do, you can try

wget -O - ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/ALL.2of4intersection.20100804.genotypes.vcf.gz | ./tabix -h - 1:57000000-57001000 > test.vcf
ADD COMMENT
2
Entering edit mode

Downloading the entire VCF file is not necessary, in most cases. If you tabix an FTP location directly, only the index file will be downloaded and tabix will access the relevant part of the VCF file directly on the FTP server. Of course, if you are regularly querying the VCF file, then I'd recommend downloading it to local stoage.

In the case here, the query that the OP posted took less than a second to run on my machine (a little MacBook Air over a wireless internet connection). Downloading the entire 62GB VCF file takes considerably longer.

ADD REPLY
0
Entering edit mode

most important, if the download breaks half way, no error message will be shown

ADD REPLY
0
Entering edit mode

Thanks both! That makes sense. I should probably update the release I'm looking at too. I intended this question to resolve my issues with using tabix behind a proxy, but I hadn't considered that the download might break.

ADD REPLY
1
Entering edit mode
11.0 years ago

Tabix is very useful to download files from 1000 Genomes, because thanks to the indexing method it allows to retrieve only portions of a file.

To use it behind a proxy, make sure that your HTTP_PROXY variables are correctly set. For example, you can add these lines to your .bashrc file:

export PROXY=http://your.proxy.edu
export PROXYPORT=8080
export http_proxy=$PROXY:$PROXYPORT
export HTTP_PROXY=$PROXY:$PROXYPORT
export https_proxy=$PROXY:$PROXYPORT
export HTTPS_PROXY=$PROXY:$PROXYPORT

Then, do a source ~/.bashrc, and tabix should work correctly. If it doesn't work, try with the latest version of tabix, I think that some previous versions did not work correctly behind a proxy.

You should be aware that in case of connection errors, tabix doesn't return any error or warning message. Thus, nothing will alert you if the file has not been downloaded correctly; you will have to check it by yourself.

If you want to download files from 1000 Genomes, a valid alternative is the Aspera client. This allows you to download the whole 1000 Genomes dataset in less than one hour. To use the Aspera client, you can follow these instructions, and possibly using the EBI server instead of the NCBI given in the example, if you are from Europe.

ADD COMMENT
0
Entering edit mode

Downloaded the newest (0.2.6) tabix and installed it by following http://genometoolbox.blogspot.co.uk/2013/11/installing-tabix-on-unix.html. The folder is called tabix-0.2.6 but when I run tabix it claims it's "Version: 0.2.5". Bit strange, but either way - thanks for the help but I still can't get tabix to work with my proxy. I've done what I wanted on the browser but it'd still be useful for the future if I could get this working.

ADD REPLY
0
Entering edit mode

Check that version 0.2.5 is not in your PATH variable: echo "$PATH"

ADD REPLY
0
Entering edit mode

or try: which tabix

ADD REPLY
0
Entering edit mode

Using either of those commands shows 0.2.6 but if I just tabix to start up the program (it displays help) then it says Version: 0.2.5 (r1005)

ADD REPLY

Login before adding your answer.

Traffic: 2446 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6