Question

Downloading SRA data

0

Entering edit mode

5.4 years ago

anabaena ▴ 10

Hey all, I am trying to download multiple reads of a large data set from SRA. What I've done in the past is go through and for each read set get the FTP link and add it to a file and then run a wget loop in a shell script to download all of the links. I was wondering if there was another way to do so. I need to download a very large set (terabytes) with many paired reads. Is there a way in run selector to do so? I know it allows you to download a JWT but I'm not sure how that works

Thanks

metagenomics SRA reads • 3.4k views

ADD COMMENT • link updated 5.4 years ago by liorglic ★ 1.5k • written 5.4 years ago by anabaena ▴ 10

0

Entering edit mode

5.4 years ago

Mensur Dlakic ★ 29k

There are literally hundreds of answers on Biostars to this question. Maybe start with this, and there are many other links on the same page where it says "Similar posts."

ADD COMMENT • link 5.4 years ago by Mensur Dlakic ★ 29k

score 3 · Accepted Answer · 2020-03-12

3

Entering edit mode

5.4 years ago

liorglic ★ 1.5k

I found that downloading from ENA is much faster than SRA, but still rather slow. What really improved speed for me was using this script (based on Aspera). Download speed increased about x60.

ADD COMMENT • link 5.4 years ago by liorglic ★ 1.5k

1

Entering edit mode

Yes, good point. The script you link is based on the tutorial I linked in my answer.

ADD REPLY • link 5.4 years ago by ATpoint 88k

score 2 · Accepted Answer · 2020-03-12

2

Entering edit mode

5.4 years ago

ATpoint 88k

Here is a tutorial that covers download from ENA and NCBI Fast download of FASTQ files from the European Nucleotide Archive (ENA) efficiently. For large data downloading fastq directly from ENA is probably the fastest way. If it is access-restricted and you have to download from NBI then use prefetch and parallel-fastq-dump, all covered in the tutorial.

It also contains a link to sra-explorer, a handy tool that can provide download links for NCBI fastq files and is great to query NCBI for data.

ADD COMMENT • link 5.4 years ago by ATpoint 88k

0

Entering edit mode

awesome thanks! I have it mostly down but I am encountering an error when I run the for loop, any idea what this may come from? I'll also show the command being used below.

ascp: Source file list not specified, exiting.

$ascp -QT -l 300m -P33001 -i $HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:/vol1/fastq/ERR164/ERR164407/ERR164407.fastq.gz .

ADD REPLY • link 5.4 years ago by anabaena ▴ 10

0

Entering edit mode

What is the content of $ascp?

ADD REPLY • link 5.4 years ago by ATpoint 88k

0

Entering edit mode

$HOME/.aspera/connect/bin/ascp

I'm sorry I'm not exactly sure what is meant by content, what I've been doing is trying to run a single line from the download.txt file to see if it runs before running the download for loop

ADD REPLY • link 5.4 years ago by anabaena ▴ 10

0

Entering edit mode

I actually managed to fix that issue, now when iterating over the download files i recieve the following error

Session Stop (Error: Private key file not found at path /global/home/users/user_name/.ssh/$HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh and path /global/home/users/user_name/.ssh/$HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh)

ADD REPLY • link 5.4 years ago by anabaena ▴ 10

0

Entering edit mode

/global/home/users/user_name/.ssh/ and $HOME must not be in the same path.

$HOME is /global/home/users/user_name/. Did you follow the tutorial without changes? This should not happen if you use the code I provided.

ADD REPLY • link 5.4 years ago by ATpoint 88k

0

Entering edit mode

So i followed your code exactly, the only different I can think of is I have the download.txt in my scratch folder (where I have alot of download space on the HPC and not my home directory.

ADD REPLY • link 5.4 years ago by anabaena ▴ 10

0

Entering edit mode

So I've managed to troubleshoot the problem as there was something wrong with the $HOME location so I replaced it with ~. Now I get the following error:

ascp: Failed to open TCP connection for SSH, exiting.

Session Stop (Error: Failed to open TCP connection for SSH)

Is this an issue with connected to ENA? or the server I am running the command from?

ADD REPLY • link 5.4 years ago by anabaena ▴ 10

0

Entering edit mode

Yes, ENA is currently moving its data center and has announced that services will be impaired or unavailable the next week(s). That virus that is spreading around will do its part in slowing that down as well. Currently it is probably best to download SRA with prefetch and then convert with fastq-dump. My tutorial covers this as well.

ADD REPLY • link 5.4 years ago by ATpoint 88k