Any fast way to download 1000 Genome Phase 3?
1
0
Entering edit mode
5.0 years ago
b.ambrozio ▴ 30

Hello! I'm trying to download the 1000 Genomes (phase 3) through Aspera, but the instructions at the documentation don't work. Via command line (using ascp) I get the error: ERR [ascp] SSH authentication failed. Eg:

ascp -i /home/ibmuser/.aspera/connect/etc/asperaweb_id_dsa.putty -Tr -Q -l 100M -P33001 -L- fasp-g1k@fasp.1000genomes.ebi.ac.uk:vol1/ftp/release/20130502/ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz.tbi ./

Via Aspera Desktop I face: "SSH_MSG_DISCONNECT: 2 Too many authentication failures"

I have FTP in progress, but it's taking ages.

Thanks!

aspera fasp 1000genomes • 2.1k views
ADD COMMENT
2
Entering edit mode

I have tried the command on my computer and the file has been successfully downloaded, which version of ascp are you using? Is the private-key-file path correct?

ADD REPLY
0
Entering edit mode
ADD REPLY
1
Entering edit mode
5.0 years ago
b.ambrozio ▴ 30

Ok, I got it working. The change was pretty much the -i parameter that in my case had to be for the new version of the ascp: asperaweb_id_dsa.putty. That's funny as yesterday I'm pretty sure I tried and didn't work (error "Too many authentication failures". I'm guessing the credential were blocked, or so...). Anyway, here's a script I've coded to download everything at once:

echo "Start: date"

FASP_ADDRESS="/home/ibmuser/.aspera/connect/etc/asperaweb_id_dsa.openssh -Tr -Q -l 100M -P33001 -L- fasp-g1k@fasp.1000genomes.ebi.ac.uk:vol1/ftp/release/20130502"

for CHR in $(seq 1 22); do
FILE=ALL.chr$CHR.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz echo "Downloading '$FILE'..." echo "ascp -i $FASP_ADDRESS/$FILE ./" done

echo "End: date"

It's available for download from my Github too (with along an FTP version, if you will): https://github.com/bambrozio/bioinformatics/tree/master/utils

Thanks!

ADD COMMENT
0
Entering edit mode

The downloads from https://www.cog-genomics.org/plink/2.0/resources#1kg_phase3 are ~70% smaller, and contain all the information in the VCFs (“plink2 —pfile ... —export vcf bgz” can be used to generate actual VCFs).

ADD REPLY

Login before adding your answer.

Traffic: 1648 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6