How To Download All Sra Samples At Once ?
4
16
Entering edit mode
10.8 years ago
biorepine ★ 1.5k

Dear Biostars,

As you may know SRA is a repository for all types of sequencing data. I often times have to do manual download by copying links of every SRA dataset by hand and use wget. I am wondering is there any simplest approach than manual copying of links ? Thanks in advance

For ex: How can I download all the data related to SRP026197? http://www.ncbi.nlm.nih.gov/sra?term=SRP026197

geo sra • 56k views
ADD COMMENT
1
Entering edit mode

Have you tried the SRAdb package from bioconductor? It's been a while, but I think it can be used to do that sort of thing.

ADD REPLY
0
Entering edit mode

Actually, SRA is the repository for sequence data, not GEO. There are links between the two databases, but your question is actually related to SRA.

ADD REPLY
0
Entering edit mode

oh yeah you are right. I will edit my question. thanx

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

when I run the code on my computer,I have a problem below,what is wrong?

library(SRAdb)

srafile=getSRAdbFile()

trying URL 'http://gbnci.abcc.ncifcrf.gov/backup/SRAmetadb.sqlite.gz' Content type 'application/x-gzip' length 1308358823 bytes (1247.7 Mb) opened URL downloaded 1247.7 Mb

Unzipping...

Error in .local(drv, ...) : Could not connect to database: unable to open database file

ADD REPLY
0
Entering edit mode

Perhaps you ran out of space in /tmp or the equivalent. Anyway, please post things like this as new questions.

ADD REPLY
33
Entering edit mode
10.8 years ago

In R:

source('http://bioconductor.org/biocLite.R')
biocLite('SRAdb')
library(SRAdb)
srafile = getSRAdbFile()
con = dbConnect('SQLite',srafile)

Now we are ready to query the local SQLite database:

listSRAfile('SRP026197',con)

Results in:

        study    sample experiment       run                                                                                                           ftp
1   SRP026197 SRS449410  SRX311638 SRR913951 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX311/SRX311638/SRR913951/SRR913951.sra
2   SRP026197 SRS449476  SRX311704 SRR914066 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX311/SRX311704/SRR914066/SRR914066.sra
3   SRP026197 SRS449408  SRX311636 SRR913949 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX311/SRX311636/SRR913949/SRR913949.sra
....
247 SRP026197 SRS449508  SRX311735 SRR914158 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX311/SRX311735/SRR914158/SRR914158.sra
248 SRP026197 SRS449460  SRX311688 SRR914006 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX311/SRX311688/SRR914006/SRR914006.sra
249 SRP026197 SRS449509  SRX311736 SRR914160 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX311/SRX311736/SRR914160/SRR914160.sra

If you simply want to have R do the downloads for you, that is also straightforward:

getSRAfile('SRP026197',con,fileType='sra')

If you have access to the aspera client command line utility, ascp, you can have R use it instead of ftp, resulting in much greater download speeds. See the help for getSRAfile for details.

ADD COMMENT
7
Entering edit mode

In my case, the solution above worked with some modifications - I had to install and load the DBI package first and then change the dbConnect line:

source('http://bioconductor.org/biocLite.R')
biocLite('SRAdb')
library(SRAdb)
biocLite('DBI')
library(DBI)
srafile = getSRAdbFile()
con = dbConnect(RSQLite::SQLite(), srafile)
listSRAfile('SRP026197', con)

Without these modifications I got the message "Error: unable to find an inherited method for function 'dbConnect' for signature '"character"'".

ADD REPLY
0
Entering edit mode

hi .I use these codes But I have Problem :

source('http://bioconductor.org/biocLite.R')

biocLite('SRAdb')

library(SRAdb)

biocLite('DBI')

library(DBI)

srafile = getSRAdbFile()

con = dbConnect(RSQLite::SQLite(), srafile)

listSRAfile('SRP026197', con)

after Downloading I have this error Error in result_create(conn@ptr, statement) : database disk image is malformed

What should I do??

ADD REPLY
0
Entering edit mode

Hi, it is working great! However, I couldn't find a way to retrieve the information (ex: A specific tissue RNA-Seq) that related to specific SRA number. They are usually marked by GSE ids rather than SRA ids. Any suggestions would be appreciated!

ADD REPLY
0
Entering edit mode

You can use GEOmetadb to access NCBI GEO information in a similar way as for SRA data and SRAdb.

ADD REPLY
0
Entering edit mode

Yes but I already downloaded and processed large number of SRA samples. All I want to do is rename them with proper GEOid. I didn't see any information on this in either of the packages :(

ADD REPLY
0
Entering edit mode

This comes a bit late, but you might want to try something like this:

library(GEOquery)
gse <- getGEO('GSE48138') # retrieves a GEO list set for your SRA id.
## see what is in there:
show(gse)
# There are 2 sets of samples for that ID
##  what you want is table a with SRR to download and some sample information:
## lets see what the first set contains:
df <- as.data.frame(gse[[1]])
head(df)

The table above contains loads of information regarding the samples/files, IDs, etc. You will have to see what interests you, and use it to rename the files. I hope it helps.

ADD REPLY
0
Entering edit mode

Hello there!

I am trying to extract the following SRA accession numbers with Bioconductor v3.1:

"SRP041432","ERP010058","SRP032486","SRP048789","SRP016517","ERP010240","SRP042345","SRP050383","SRP039499","SRP024388","SRP039009","SRP040131","SRP010723","ERP010570","SRP045342","ERP002340","ERP003677","SRP040950".

However, by running

getSRAfile(in_acc = c("SRP041432","ERP010058","SRP032486","SRP048789","SRP016517","ERP010240","SRP042345","SRP050383","SRP039499","SRP024388","SRP039009","SRP040131","SRP010723","ERP010570","SRP045342","ERP002340","ERP003677","SRP040950"), sra_con = sra_con,
+            destDir = getwd(), fileType = 'sra', srcType='ftp')

I get error messages due to specific files, which I later confirm are available for download in SRAdownload, for example...

The error message:

trying URL 'ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/ERX/ERX219/ERX219608/ERR245074/ERR245074.sra'
Error in download.file(i, destfile = file.path(destDir, basename(i)),  :
  cannot open URL 'ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/ERX/ERX219/ERX219608/ERR245074/ERR245074.sra'

Am I doing anything wrong?

ADD REPLY
0
Entering edit mode
srafile = getSRAdbFile()
trying URL 'http://dl.dropbox.com/u/51653511/SRAmetadb.sqlite.gz'
Error in download.file(url_sra, destfile = localfile, mode = "wb", method = method) : 
cannot open URL 'http://dl.dropbox.com/u/51653511/SRAmetadb.sqlite.gz'
ADD REPLY
0
Entering edit mode

How to do this in R for controlled access data hosted at dbGaP if we have the key file rather than using prefetch/fastq-dump?

ADD REPLY
15
Entering edit mode
6.5 years ago

A non-R solution is to use the SRA toolkit prefetch command on a list of SRA identifiers.

First you need the file list. You can batch download it. In your case, go to https://www.ncbi.nlm.nih.gov/sra?term=SRP026197 Top-right, click to "Send To", "File", "Accession List".

Once you have it saved in a file (default is SraAccList.txt) you can use the command (tested in SRA toolkit 2.9.0):

prefetch $(<SraAccList.txt)

The .sra files will be downloaded in the default SRA folder. You can change with this trick:

echo '/repository/user/main/public/root = "/path/to/download"' > $HOME/.ncbi/user-settings.mkfg
ADD COMMENT
4
Entering edit mode

This is brilliant! It also works for fastq-dump:

fastq-dump --split-3 --gzip $(</path_to/SRR_Acc_List.txt)
ADD REPLY
0
Entering edit mode

HI I tried, it doesn't work out for me. I had 976 files to be downloaded. SRA Study:SRP130211 But, I'm able to download each SRR file separately

./prefetch $(/home/data/yellow/SRR_Acc_List.txt) SRR6483251: command not found SRR6483252: command not found

ADD REPLY
0
Entering edit mode

HI I tried it doesn't workout for me. I had 976 files to be downloaded. SRA Study:SRP130211

./fastq-dump $(/home/data/yellow/SRR_Acc_List.txt) SRR6483251: command not found SRR6483252: command not found

ADD REPLY
1
Entering edit mode
6.8 years ago
vr ▴ 10

If you have a GSE accession, you can give this a try: https://github.com/pepkit/geofetch

The most important precondition is proper configuration of where you'd like the raw .sra files to be downloaded. You can also set some environment variables (that are mentioned in the command-line help for the tool geofetch.py) that will facilitate straightforward use. It can be as simple as something like:

/path/to/geofetch.py -i [GSE accession]

ADD COMMENT
0
Entering edit mode
2.7 years ago

Currently, the simplest approach is to use SRA Explorer

ADD COMMENT

Login before adding your answer.

Traffic: 1320 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6