Suppose I have a list of samples like: SRS049712, SRS049995, etc. How do I get a list of all runs (like SRR062376 or SRR059852) for these samples?
Suppose I have a list of samples like: SRS049712, SRS049995, etc. How do I get a list of all runs (like SRR062376 or SRR059852) for these samples?
You can use the SRAdb bioconductor package (or the accompanying SQLite file) to do stuff like this pretty easily:
First, set up the library and sqlite connection.
library(SRAdb)
sqlfile = getSRAdbFile()
dbcon = dbConnect('SQLite',sqlfile)
SRAdb has multiple means to convert one entity type to another, but sraConvert is meant specifically for that purpose.
sraConvert(c('SRS049712','SRS049995'),'run',dbcon)
And the output looks like:
sample run
1 SRS049712 SRR061165
2 SRS049712 SRR061174
3 SRS049995 SRR059984
4 SRS049995 SRR059985
If you want a list of files (of data) associated with these same accessions, you can try this:
listSRAfile(c('SRS049712','SRS049995'),dbcon)
This will result in:
sample study experiment run ftp
1 SRS049712 SRP002163 SRX023993 SRR061165 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX023/SRX023993/SRR061165/SRR061165.lite.sra
2 SRS049712 SRP002163 SRX023993 SRR061174 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX023/SRX023993/SRR061174/SRR061174.lite.sra
3 SRS049995 SRP002163 SRX023365 SRR059984 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX023/SRX023365/SRR059984/SRR059984.lite.sra
4 SRS049995 SRP002163 SRX023365 SRR059985 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX023/SRX023365/SRR059985/SRR059985.lite.sra
The third argument to listSRAfile() is the file type; this can be "fastq", for example. The ascpSRA()
function will use Aspera (if the command-line ascp is available) to download litesra or fastq files for a given set of accessions. Note that the SQLite file that is downloaded by getSRAdbFile()
can be used by any language with a SQLite client (ruby, python, perl, java, etc.).
Using NCBI esearch db=biosample you can find that SRS049712 is ID=69721 (http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=biosample&term=SRS049712)
Then use NCBi-ELInk to get the items in SRA linked to that sample: http://www.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=biosample&id=69721&db=sra
Result: http://www.ncbi.nlm.nih.gov/sra/25984 (SRX023993)
You can do this using CGI or REST interfaces from NCBI or EBI, respectively.
From NCBI, use the SRA Run Info CGI
wget -qO- 'http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?save=efetch&db=sra&rettype=runinfo&term=SRS049712' | grep SRS049712 | cut -f1 -d","
SRR061165
SRR061174
From EBI, use the ENA REST search
wget -qO- "http://www.ebi.ac.uk/ena/data/warehouse/filereport?accession=SRS049712&result=read_run" | grep -v accesssion | cut -f6
run_accession
SRR061165
SRR061174
There are many more fields that can be returned via EBI's REST call (see the bottom of this page), and you can customize the response in the EBI solution, whereas you get what you are given from the NCBI solution.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I was getting an error using:
So I changed it to:
Now it's working well. Thanks for the example.