Question

How To Download All Sra Samples At Once ?

16

Entering edit mode

10.8 years ago

biorepine ★ 1.5k

Dear Biostars,

As you may know SRA is a repository for all types of sequencing data. I often times have to do manual download by copying links of every SRA dataset by hand and use wget. I am wondering is there any simplest approach than manual copying of links ? Thanks in advance

For ex: How can I download all the data related to SRP026197? http://www.ncbi.nlm.nih.gov/sra?term=SRP026197

geo sra • 56k views

ADD COMMENT • link updated 20 months ago by Ram 44k • written 10.8 years ago by biorepine ★ 1.5k

1

Entering edit mode

Have you tried the SRAdb package from bioconductor? It's been a while, but I think it can be used to do that sort of thing.

ADD REPLY • link 10.8 years ago by Devon Ryan 104k

0

Entering edit mode

Actually, SRA is the repository for sequence data, not GEO. There are links between the two databases, but your question is actually related to SRA.

ADD REPLY • link 10.8 years ago by Sean Davis 27k

0

Entering edit mode

oh yeah you are right. I will edit my question. thanx

ADD REPLY • link 10.8 years ago by biorepine ★ 1.5k

0

Entering edit mode

here is another solution A: How to download raw sequence data from GEO/SRA

ADD REPLY • link 10.2 years ago by Istvan Albert 101k

0

Entering edit mode

when I run the code on my computer,I have a problem below,what is wrong?

library(SRAdb)

srafile=getSRAdbFile()

trying URL 'http://gbnci.abcc.ncifcrf.gov/backup/SRAmetadb.sqlite.gz' Content type 'application/x-gzip' length 1308358823 bytes (1247.7 Mb) opened URL downloaded 1247.7 Mb

Unzipping...

Error in .local(drv, ...) : Could not connect to database: unable to open database file

ADD REPLY • link 8.6 years ago by Ada • 0

0

Entering edit mode

Perhaps you ran out of space in /tmp or the equivalent. Anyway, please post things like this as new questions.

ADD REPLY • link 8.6 years ago by Devon Ryan 104k

Ram · Answer 1 · 2014-02-19

33

Entering edit mode

10.8 years ago

Sean Davis 27k

In R:

source('http://bioconductor.org/biocLite.R')
biocLite('SRAdb')
library(SRAdb)
srafile = getSRAdbFile()
con = dbConnect('SQLite',srafile)

Now we are ready to query the local SQLite database:

listSRAfile('SRP026197',con)

Results in:

        study    sample experiment       run                                                                                                           ftp
1   SRP026197 SRS449410  SRX311638 SRR913951 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX311/SRX311638/SRR913951/SRR913951.sra
2   SRP026197 SRS449476  SRX311704 SRR914066 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX311/SRX311704/SRR914066/SRR914066.sra
3   SRP026197 SRS449408  SRX311636 SRR913949 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX311/SRX311636/SRR913949/SRR913949.sra
....
247 SRP026197 SRS449508  SRX311735 SRR914158 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX311/SRX311735/SRR914158/SRR914158.sra
248 SRP026197 SRS449460  SRX311688 SRR914006 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX311/SRX311688/SRR914006/SRR914006.sra
249 SRP026197 SRS449509  SRX311736 SRR914160 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX311/SRX311736/SRR914160/SRR914160.sra

If you simply want to have R do the downloads for you, that is also straightforward:

getSRAfile('SRP026197',con,fileType='sra')

If you have access to the aspera client command line utility, ascp, you can have R use it instead of ftp, resulting in much greater download speeds. See the help for getSRAfile for details.

ADD COMMENT • link 10.8 years ago by Sean Davis 27k

7

Entering edit mode

In my case, the solution above worked with some modifications - I had to install and load the DBI package first and then change the dbConnect line:

source('http://bioconductor.org/biocLite.R')
biocLite('SRAdb')
library(SRAdb)
biocLite('DBI')
library(DBI)
srafile = getSRAdbFile()
con = dbConnect(RSQLite::SQLite(), srafile)
listSRAfile('SRP026197', con)

Without these modifications I got the message "Error: unable to find an inherited method for function 'dbConnect' for signature '"character"'".

ADD REPLY • link 9.4 years ago by adumitri ▴ 70

0

Entering edit mode

hi .I use these codes But I have Problem :

source('http://bioconductor.org/biocLite.R')

biocLite('SRAdb')

library(SRAdb)

biocLite('DBI')

library(DBI)

srafile = getSRAdbFile()

con = dbConnect(RSQLite::SQLite(), srafile)

listSRAfile('SRP026197', con)

after Downloading I have this error Error in result_create(conn@ptr, statement) : database disk image is malformed

What should I do??

ADD REPLY • link 6.3 years ago by samane. • 0

0

Entering edit mode

Hi, it is working great! However, I couldn't find a way to retrieve the information (ex: A specific tissue RNA-Seq) that related to specific SRA number. They are usually marked by GSE ids rather than SRA ids. Any suggestions would be appreciated!

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by biorepine ★ 1.5k

0

Entering edit mode

You can use GEOmetadb to access NCBI GEO information in a similar way as for SRA data and SRAdb.

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by Sean Davis 27k

0

Entering edit mode

Yes but I already downloaded and processed large number of SRA samples. All I want to do is rename them with proper GEOid. I didn't see any information on this in either of the packages :(

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by biorepine ★ 1.5k

0

Entering edit mode

This comes a bit late, but you might want to try something like this:

library(GEOquery)
gse <- getGEO('GSE48138') # retrieves a GEO list set for your SRA id.
## see what is in there:
show(gse)
# There are 2 sets of samples for that ID
##  what you want is table a with SRR to download and some sample information:
## lets see what the first set contains:
df <- as.data.frame(gse[[1]])
head(df)

The table above contains loads of information regarding the samples/files, IDs, etc. You will have to see what interests you, and use it to rename the files. I hope it helps.

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 9.8 years ago by A. Domingues ★ 2.7k

0

Entering edit mode

Hello there!

I am trying to extract the following SRA accession numbers with Bioconductor v3.1:

"SRP041432","ERP010058","SRP032486","SRP048789","SRP016517","ERP010240","SRP042345","SRP050383","SRP039499","SRP024388","SRP039009","SRP040131","SRP010723","ERP010570","SRP045342","ERP002340","ERP003677","SRP040950".

However, by running

getSRAfile(in_acc = c("SRP041432","ERP010058","SRP032486","SRP048789","SRP016517","ERP010240","SRP042345","SRP050383","SRP039499","SRP024388","SRP039009","SRP040131","SRP010723","ERP010570","SRP045342","ERP002340","ERP003677","SRP040950"), sra_con = sra_con,
+            destDir = getwd(), fileType = 'sra', srcType='ftp')

I get error messages due to specific files, which I later confirm are available for download in SRAdownload, for example...

The error message:

trying URL 'ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/ERX/ERX219/ERX219608/ERR245074/ERR245074.sra'
Error in download.file(i, destfile = file.path(destDir, basename(i)),  :
  cannot open URL 'ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/ERX/ERX219/ERX219608/ERR245074/ERR245074.sra'

Am I doing anything wrong?

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 9.2 years ago by massacomgrao • 0

0

Entering edit mode

srafile = getSRAdbFile()
trying URL 'http://dl.dropbox.com/u/51653511/SRAmetadb.sqlite.gz'
Error in download.file(url_sra, destfile = localfile, mode = "wb", method = method) : 
cannot open URL 'http://dl.dropbox.com/u/51653511/SRAmetadb.sqlite.gz'

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 7.9 years ago by kevinchjp ▴ 10

0

Entering edit mode

How to do this in R for controlled access data hosted at dbGaP if we have the key file rather than using prefetch/fastq-dump?

ADD REPLY • link 7.2 years ago by Ömer An ▴ 260

score 15 · Answer 2 · 2018-06-18

15

Entering edit mode

6.4 years ago

Federico Giorgi ▴ 740

A non-R solution is to use the SRA toolkit prefetch command on a list of SRA identifiers.

First you need the file list. You can batch download it. In your case, go to https://www.ncbi.nlm.nih.gov/sra?term=SRP026197 Top-right, click to "Send To", "File", "Accession List".

Once you have it saved in a file (default is SraAccList.txt) you can use the command (tested in SRA toolkit 2.9.0):

prefetch $(<SraAccList.txt)

The .sra files will be downloaded in the default SRA folder. You can change with this trick:

echo '/repository/user/main/public/root = "/path/to/download"' > $HOME/.ncbi/user-settings.mkfg

ADD COMMENT • link 6.4 years ago by Federico Giorgi ▴ 740

4

Entering edit mode

This is brilliant! It also works for fastq-dump:

fastq-dump --split-3 --gzip $(</path_to/SRR_Acc_List.txt)

ADD REPLY • link 5.7 years ago by ThePresident ▴ 180

0

Entering edit mode

HI I tried, it doesn't work out for me. I had 976 files to be downloaded. SRA Study:SRP130211 But, I'm able to download each SRR file separately

./prefetch $(/home/data/yellow/SRR_Acc_List.txt) SRR6483251: command not found SRR6483252: command not found

ADD REPLY • link 4.5 years ago by sunnykevin97 ▴ 990

0

Entering edit mode

HI I tried it doesn't workout for me. I had 976 files to be downloaded. SRA Study:SRP130211

./fastq-dump $(/home/data/yellow/SRR_Acc_List.txt) SRR6483251: command not found SRR6483252: command not found

ADD REPLY • link 4.5 years ago by sunnykevin97 ▴ 990

score 1 · Answer 3 · 2018-02-23

If you have a GSE accession, you can give this a try: https://github.com/pepkit/geofetch

The most important precondition is proper configuration of where you'd like the raw .sra files to be downloaded. You can also set some environment variables (that are mentioned in the command-line help for the tool geofetch.py) that will facilitate straightforward use. It can be as simple as something like:

/path/to/geofetch.py -i [GSE accession]

score 0 · Answer 4 · 2022-03-17

0

Entering edit mode

2.7 years ago

solomoncharles77 ▴ 90

Currently, the simplest approach is to use SRA Explorer

ADD COMMENT • link 2.7 years ago by solomoncharles77 ▴ 90