download from SRA
4
2
Entering edit mode
8.2 years ago
zh.khodadadi ▴ 20

how can i download a list of SRR accession from SRA by sratoolkit? what is the configuration list of SRR numbers?

rna-seq • 14k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Sorry to bring up an old thread, but..

What is the difference between prefetch and fastq-dump?

From what I read, both will download the SRR filet, but one in SRA format while the other in fasq format? if so, what is SRA format? and if what I understood is wrong, please elaborate.

ADD REPLY
0
Entering edit mode

Check out this SRA Download guide from NCBI for answers to your questions.

ADD REPLY
0
Entering edit mode

As I wrote in my comment "From what I read", as I was reading there already :).To me, it does not make sense to have prefetch, why add an extra step to get the data format you want, you just can fastq-dump whatever you want directly, correct? or am I missing something for prefetch?

ADD REPLY
0
Entering edit mode

The ‘prefetch’ utility in the SRA Toolkit can be used to download SRA data and any required reference sequences in a single operation.

For some datasets data may be uploaded as reference compressed files. In order to recreate original sequence data one needs to have the exact reference used for that compression. As line above indicates prefetch facilitates downloads of data/reference in one step.

If you do not use prefetch for such data then

you will then need to determine (1) if your downloaded dataset is reference-compressed, (2) if so, which references are required to access the data (see vdb-dump for an example of how to determine this), and (3) acquire the reference sequences manually.

Whenever possible you should avoid using SRA (except for datasets that need authorization) and download data in fastq format directly from EBI/ENA. Fast download of FASTQ files from the European Nucleotide Archive (ENA)

ADD REPLY
8
Entering edit mode
8.2 years ago
st.ph.n ★ 2.7k

Throw your SRR numbers into a file called SRR_list.txt, one number per line.

Then add this to a file called get_SRR_data.sh

   #!/usr/bin/bash

    fastq-dump --split-3 $1

and run on the command line with:

cat SRR_list.txt | xargs -n 1 bash get_SRR_data.sh

Fastq-dump will pull the data, one by one for all accesion numbers in your list, and turn each into a fastq at the same time. The --split-3 will create paired end files if available. Provide the path to fastq-dump in the bash script, if it is not installed globally on your system.

If you prefer @Satya's suggestion of using wget:

#!/usr/bin/bash

wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/"$1"/"$1".sra

fastq-dump --split-3 "$1".sra
ADD COMMENT
0
Entering edit mode
8.2 years ago
Mike ★ 1.9k

have a look at @Obi Griffith previous post:

Determine the SRR number and then download the data at the command-line with:

prefetch -v SRR925811

How to download raw sequence data from GEO/SRA

ADD COMMENT
0
Entering edit mode
8.2 years ago
Satyajeet Khare ★ 1.6k

I use wget to download

wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR(first three digits)/SRR(all digits)/SRR(all digits).sra

and fastq-dump to convert to fastq

fastq-dump --split-3 SRR(all digits).sra
ADD COMMENT
0
Entering edit mode

There's no need to pull the data, and then convert to fastq. fastq-dump will do both for you.

ADD REPLY
0
Entering edit mode

I agree, but wget with ftp is way faster, unless there is a way to use fastq-dump with ftp that I am not aware of.

ADD REPLY
0
Entering edit mode

As far as I know, sra can block the ip if you download a lot of files with wget.

ADD REPLY
0
Entering edit mode

In my experience, the fastest and the most secure (without connection interruptions) is to use prefetch with aspera, then convert sra files to fastq with fastq-dump. The whole thing saves a lot of time.

ADD REPLY
0
Entering edit mode
6.5 years ago

You can use xargs and the sra-toolkit prefetch to download every SRR id contained in a txt file list, like:

xargs -n1 prefetch < SRR_Acc_List.txt
ADD COMMENT
0
Entering edit mode

im using this but i got very weird error:

2018-11-14T08:47:00 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR067578 ' cannot be found.

2018-11-14T08:47:01 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR067621 ' cannot be found.

2018-11-14T08:47:01 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR067637 ' cannot be found.

Can you please help?

ADD REPLY

Login before adding your answer.

Traffic: 2286 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6