SRA files bulk downloads
2
2
Entering edit mode
6.2 years ago
S AR ▴ 80

How do i use aspera or wget to download the SRA files in bulk either by RUN/Sample/Experiemts. My SRA ID list contains IDs from Exp (SRX) Run(SRR/ERR) and samples as well. I tried prefetch from sratoolkit:

prefetch --list ../XDR_169_ids.txt

XDR_169_ids.txt:

SRS551840
ERR688040
ERR688041
SRS551807
ERR688042
ERR688043
ERR688044
ERR688045
ERR688046
ERR688047
ERR688048
SRR1269497
(...)

But Prefetch was giving the following error:

2018-11-02T05:27:44 prefetch.2.8.2 warn: '../XDR_169_ids.txt' is invalid or not a kart file

I converted it to .table file also supported by prefetch because dont know what was the KART format. bUt it is giving same error so i used :

prefetch $(../XDR_169_ids.txt)

It gave the error will all ids im pasting few:

../XDR_169_ids.txt: line 157: $'ERR234622\r': command not found
../XDR_169_ids.txt: line 158: $'SRS551952\r': command not found
../XDR_169_ids.txt: line 159: $'SRR671794\r': command not found
../XDR_169_ids.txt: line 163: $'SRS552331\r': command not found

I tried:

prefetch ERR688040

again error:

2018-11-02T05:32:10 prefetch.2.8.2: 1) 'ERR688040' is found locally

Any suggestions? I have 4000 SRA IDs and i want to get it download with fastest speed i tried aspera but i dont know what should i write in the end where we give file name (i dont want to give each name in single command)

aspera wget recursive awk linux • 12k views
ADD COMMENT
0
Entering edit mode

What is that ERR688040 ? Is that ID correct? and why don't you try .sh script with fastqdump.

ADD REPLY
0
Entering edit mode

@OP: I abridged the list of accessions a bit to improve readability.

ADD REPLY
5
Entering edit mode
6.2 years ago
ATpoint 86k

prefetch is indeed the way to go here. Prefetch uses aspera internally if you set it up properly. Here the manual.

The IDs with prefix SRR and ERR can be directly downloaded via prefetch SRR/ERR(...). The SRS accession number contains multiple experiments/runs, therefore you first have to get the SRR numbers from it.

Do it via Entrez Direct (available via conda) as suggested on Biostars previously. Example:

## Extract SRA/ERR:
esearch -db sra -query SRS551840 | efetch --format runinfo | cut -d ',' -f 1 | grep SRR 

## Output:
SRR1159129
SRR1159377
SRR1181071
SRR1181300

In your case, I would make a download list like:

##Extract SRR/ERR:
grep -E 'SRR|ERR' XDR_169_ids.txt > downloads.txt

## Find SRAs from SRS:
grep 'SRS' XDR_169_ids.txt | parallel "esearch -db sra -query {} | efetch --format runinfo | cut -d ',' -f 1 | grep SRR" >> downloads.txt

## Now make sure there are no duplicates, then download using GNU parallel to have 4 (or as many your disk can handle) streams in parallel:
sort -u downloads.txt | parallel -j 4 "prefetch {}"

Once you have the sra files, convert to fastq with parallel-fastq-dump.

2018-11-02T05:32:10 prefetch.2.8.2: 1) 'ERR688040' is found locally

That means that the file is already present at the download folder, so download of this one should be finished.

ADD COMMENT
0
Entering edit mode

Atpoint thats great i l try this . Thanku

ADD REPLY
0
Entering edit mode

Did it work for you?

ADD REPLY
0
Entering edit mode

Hi ATpoint,

Sorry i was out of country to attend a conference and i tried it today.. And yes it did worked. It extracted me all those SRS ids. But when i tried:

sort -u downloads.txt | parallel -j 4 "prefetch {}"

Im getting the following error. Can you help me with this:

2018-11-13T04:24:10 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR067743 ' cannot be found.

2018-11-13T04:24:12 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR117453 ' cannot be found.

2018-11-13T04:24:12 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR108480 ' cannot be found.

2018-11-13T04:24:12 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR117454 ' cannot be found.

2018-11-13T04:24:14 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR133854 ' cannot be found.

2018-11-13T04:24:14 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR133900 ' cannot be found.

2018-11-13T04:24:14 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR133890 ' cannot be found.
ADD REPLY
0
Entering edit mode
6.1 years ago
S AR ▴ 80

I tried to fetch one id:

prefetch ERR133900

After 15 mins or so it gave this log messages and i didnt find the ERR133900 file anywhere,:

2018-11-13T04:27:21 prefetch.2.8.2: 1) Downloading 'ERR133900'...
2018-11-13T04:27:21 prefetch.2.8.2:  Downloading via https...
2018-11-13T04:40:16 prefetch.2.8.2: 1) 'ERR133900' was downloaded successfully
2018-11-13T04:40:23 prefetch.2.8.2: 'ERR133900' has 1 unresolved dependency
2018-11-13T04:40:27 prefetch.2.8.2: 2) Downloading 'ncbi-acc:AL123456.2?vdb-ctx=refseq'...
2018-11-13T04:40:27 prefetch.2.8.2:  Downloading via https...
2018-11-13T04:40:30 prefetch.2.8.2: 2) 'ncbi-acc:AL123456.2?vdb-ctx=refseq' was downloaded successfully
2018-11-13T04:40:41 prefetch.2.8.2: 'ERR133900' has no remote vdbcache
ADD COMMENT
0
Entering edit mode

This means, that download has finished successfully. Please note that SRA files are not self contained. This particular SRA file comprises a mapping of the reads to reference sequence AL123456.2 (isolate H37Rv). This reference sequence was also downloaded to your local disc. The following command will dump the first two read pairs:

fastq-dump --split-spot -Z ERR133900 | head -8
ADD REPLY
0
Entering edit mode

oh.. But i want my other problem was i want to download a bulk in one go for which i was getting errors:

2018-11-13T04:24:10 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR067743 ' cannot be found.

2018-11-13T04:24:12 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR117453 ' cannot be found.

2018-11-13T04:24:12 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR108480 ' cannot be found.

2018-11-13T04:24:12 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR117454 ' cannot be found.

like this.

For bulk download i used the command:

sort -u downloads.txt | parallel -j 4 "prefetch {}"

But as i mentioned above if im doing it manually it did downloaded but i dont know where? not showing in my folder

ADD REPLY
0
Entering edit mode

There is a whitespace behind your accessions 'ERR067743 ' instead of 'ERR067743'. Remove that.

ADD REPLY
0
Entering edit mode

I did removed it but still same error:

2018-11-14T08:47:00 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR067578 ' cannot be found.

2018-11-14T08:47:01 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR067621 ' cannot be found.
ADD REPLY
1
Entering edit mode

There is still a whitespace, don't you see that?

whitespace

The command itself is correct, you input file has flaws, try:

sort -u downloads.txt | awk '{gsub(" ", "", $1);print $1}' | parallel prefetch {}

When I download the files you indicate and artificially add a whitespace after the accession number, I get the same error. Removing it solves the issue. Means you still have whitespaces. There is also no point in refreshing older posts on prefetch downloads. It is simply your input file that is wrong.

ADD REPLY
1
Entering edit mode

...and? solved it?

ADD REPLY
0
Entering edit mode

Ye kind of. I just have to break my list into 3 halves and than it is working but still it is missing few IDS but i can manage those few manually.

ADD REPLY

Login before adding your answer.

Traffic: 2009 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6