How To Get All Runs For A Sample In Sra?
3
2
Entering edit mode
12.2 years ago
Fedor Gusev ▴ 210

Suppose I have a list of samples like: SRS049712, SRS049995, etc. How do I get a list of all runs (like SRR062376 or SRR059852) for these samples?

sra • 6.8k views
ADD COMMENT
4
Entering edit mode
12.2 years ago

You can use the SRAdb bioconductor package (or the accompanying SQLite file) to do stuff like this pretty easily:

First, set up the library and sqlite connection.

library(SRAdb)
sqlfile = getSRAdbFile()
dbcon = dbConnect('SQLite',sqlfile)

SRAdb has multiple means to convert one entity type to another, but sraConvert is meant specifically for that purpose.

sraConvert(c('SRS049712','SRS049995'),'run',dbcon)

And the output looks like:

         sample       run
1 SRS049712 SRR061165
2 SRS049712 SRR061174
3 SRS049995 SRR059984
4 SRS049995 SRR059985

If you want a list of files (of data) associated with these same accessions, you can try this:

listSRAfile(c('SRS049712','SRS049995'),dbcon)

This will result in:

     sample     study experiment       run                                                                                                                    ftp
1 SRS049712 SRP002163  SRX023993 SRR061165 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX023/SRX023993/SRR061165/SRR061165.lite.sra
2 SRS049712 SRP002163  SRX023993 SRR061174 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX023/SRX023993/SRR061174/SRR061174.lite.sra
3 SRS049995 SRP002163  SRX023365 SRR059984 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX023/SRX023365/SRR059984/SRR059984.lite.sra
4 SRS049995 SRP002163  SRX023365 SRR059985 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX023/SRX023365/SRR059985/SRR059985.lite.sra

The third argument to listSRAfile() is the file type; this can be "fastq", for example. The ascpSRA() function will use Aspera (if the command-line ascp is available) to download litesra or fastq files for a given set of accessions. Note that the SQLite file that is downloaded by getSRAdbFile() can be used by any language with a SQLite client (ruby, python, perl, java, etc.).

ADD COMMENT
0
Entering edit mode

I was getting an error using:

dbcon = dbConnect('SQLite',sqlfile)

So I changed it to:

dbcon = dbConnect(dbDriver('SQLite'),sqlfile)

Now it's working well. Thanks for the example.

ADD REPLY
1
Entering edit mode
12.2 years ago

Using NCBI esearch db=biosample you can find that SRS049712 is ID=69721 (http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=biosample&term=SRS049712)

Then use NCBi-ELInk to get the items in SRA linked to that sample: http://www.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=biosample&id=69721&db=sra

Result: http://www.ncbi.nlm.nih.gov/sra/25984 (SRX023993)

ADD COMMENT
1
Entering edit mode
9.1 years ago

You can do this using CGI or REST interfaces from NCBI or EBI, respectively.

From NCBI, use the SRA Run Info CGI

wget -qO- 'http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?save=efetch&db=sra&rettype=runinfo&term=SRS049712' | grep SRS049712 | cut -f1 -d","
SRR061165
SRR061174

From EBI, use the ENA REST search

wget -qO- "http://www.ebi.ac.uk/ena/data/warehouse/filereport?accession=SRS049712&result=read_run" | grep -v accesssion | cut -f6
run_accession
SRR061165
SRR061174

There are many more fields that can be returned via EBI's REST call (see the bottom of this page), and you can customize the response in the EBI solution, whereas you get what you are given from the NCBI solution.

ADD COMMENT

Login before adding your answer.

Traffic: 1946 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6