how can i download a list of SRR accession from SRA by sratoolkit? what is the configuration list of SRR numbers?
how can i download a list of SRR accession from SRA by sratoolkit? what is the configuration list of SRR numbers?
Throw your SRR numbers into a file called SRR_list.txt
, one number per line.
Then add this to a file called get_SRR_data.sh
#!/usr/bin/bash
fastq-dump --split-3 $1
and run on the command line with:
cat SRR_list.txt | xargs -n 1 bash get_SRR_data.sh
Fastq-dump will pull the data, one by one for all accesion numbers in your list, and turn each into a fastq at the same time. The --split-3
will create paired end files if available. Provide the path to fastq-dump
in the bash script, if it is not installed globally on your system.
If you prefer @Satya's suggestion of using wget:
#!/usr/bin/bash
wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/"$1"/"$1".sra
fastq-dump --split-3 "$1".sra
have a look at @Obi Griffith previous post:
Determine the SRR number and then download the data at the command-line with:
prefetch -v SRR925811
I use wget
to download
wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR(first three digits)/SRR(all digits)/SRR(all digits).sra
and fastq-dump
to convert to fastq
fastq-dump --split-3 SRR(all digits).sra
You can use xargs and the sra-toolkit prefetch to download every SRR id contained in a txt file list, like:
xargs -n1 prefetch < SRR_Acc_List.txt
im using this but i got very weird error:
2018-11-14T08:47:00 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR067578 ' cannot be found.
2018-11-14T08:47:01 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR067621 ' cannot be found.
2018-11-14T08:47:01 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR067637 ' cannot be found.
Can you please help?
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Did you read the tutorial?
How to download raw sequence data from GEO/SRA
Sorry to bring up an old thread, but..
What is the difference between prefetch and fastq-dump?
From what I read, both will download the SRR filet, but one in SRA format while the other in fasq format? if so, what is SRA format? and if what I understood is wrong, please elaborate.
Check out this SRA Download guide from NCBI for answers to your questions.
As I wrote in my comment "From what I read", as I was reading there already :).To me, it does not make sense to have prefetch, why add an extra step to get the data format you want, you just can fastq-dump whatever you want directly, correct? or am I missing something for prefetch?
For some datasets data may be uploaded as reference compressed files. In order to recreate original sequence data one needs to have the exact reference used for that compression. As line above indicates
prefetch
facilitates downloads of data/reference in one step.If you do not use
prefetch
for such data thenWhenever possible you should avoid using SRA (except for datasets that need authorization) and download data in fastq format directly from EBI/ENA. Fast download of FASTQ files from the European Nucleotide Archive (ENA)