Dears,
I have a group of Bioproject IDs and need to retrieve their corresponding SRA IDs.
I tried to retrieve the whole data from SRA using
kywrds <- entrez_search(db = "sra", retmax = 20000,
term = "Homo sapiens[ORGN] AND Homo sapiens[orgn:__txid9606]")
However, the result of the whole homosapien is more than 4 million records, so I should use "retstart" with the "web_history" arguments with the retmax argument, but unfortunately, I couldn't do that.
The result I want to obtain is data frame of SRA IDs with their corresponding bioproject IDs
You can search SRA directly using a BioProject ID. Shown below are EntrezDirect commands that you should be able to change the syntax to match that of BioPython.
esearch -db sra -query 'PRJEB4337[bioproject]'
You can then pass those results along to esummary and extract relevant information from the output XML. For example,
Thank you for your informative answer.
I got the XML result from esummary but I need to access study ACC in ExpXml as attached
I tried to make as following
Thank you for your informative answer. I got the XML result from esummary but I need to access study ACC in ExpXml as attached I tried to make as following
But it doesn't work
No result should be a simple table not XML that you show. Do not change the command posted by vkkodali_ncbi .
Three columns produced are bioproject ID, biosample ID and SRA Accession.
I got what I want by the following command:
and produced:
Thank you for your help :)
Please go ahead and accept the original answer (green check mark) to provide closure to this thread.