SRA and Bioproject IDs
1
0
Entering edit mode
15 months ago
mrashad ▴ 80

Dears, I have a group of Bioproject IDs and need to retrieve their corresponding SRA IDs. I tried to retrieve the whole data from SRA using

kywrds <- entrez_search(db = "sra", retmax = 20000,                   
                           term = "Homo sapiens[ORGN] AND Homo sapiens[orgn:__txid9606]")

However, the result of the whole homosapien is more than 4 million records, so I should use "retstart" with the "web_history" arguments with the retmax argument, but unfortunately, I couldn't do that.

The result I want to obtain is data frame of SRA IDs with their corresponding bioproject IDs

Could you help me to do that?

Thanks

Bioproject GEO SRA • 1.1k views
ADD COMMENT
3
Entering edit mode
15 months ago
vkkodali_ncbi ★ 3.8k

You can search SRA directly using a BioProject ID. Shown below are EntrezDirect commands that you should be able to change the syntax to match that of BioPython.

esearch -db sra -query 'PRJEB4337[bioproject]'

You can then pass those results along to esummary and extract relevant information from the output XML. For example,

esearch -db sra -query 'PRJEB4337[bioproject]' | esummary | xtract -pattern DocumentSummary -element Bioproject Biosample Run@acc

will give you a 3-column, tab-delimited table with BioProject, BioSample and SRA Run accessions.

ADD COMMENT
0
Entering edit mode

Thank you for your informative answer. I got the XML result from esummary but I need to access study ACC in ExpXml as attached I tried to make as following

esearch -db sra -query 'PRJEB4337[bioproject]' | esummary | xtract -pattern DocumentSummary -element Bioproject ExpXml@Study acc Run@acc

But it doesn't work

enter image description here

ADD REPLY
0
Entering edit mode

No result should be a simple table not XML that you show. Do not change the command posted by vkkodali_ncbi .

Three columns produced are bioproject ID, biosample ID and SRA Accession.

$ esearch -db sra -query 'PRJEB4337[bioproject]' | esummary | xtract -pattern DocumentSummary -element Bioproject Biosample Run@acc
PRJEB4337       SAMEA2145774    ERR315468
PRJEB4337       SAMEA2154125    ERR315343
PRJEB4337       SAMEA2145893    ERR315339
PRJEB4337       SAMEA2156266    ERR315348
ADD REPLY
1
Entering edit mode

I got what I want by the following command:

esearch -db sra -query 'PRJEB4337[bioproject]' | esummary | xtract -pattern DocumentSummary -element Study@acc Bioproject Biosample Run@acc

and produced:

ERP003613      PRJEB4337       SAMEA2145774    ERR315468

Thank you for your help :)

ADD REPLY
0
Entering edit mode

Please go ahead and accept the original answer (green check mark) to provide closure to this thread.

ADD REPLY

Login before adding your answer.

Traffic: 2668 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6