Question

Is there a tool to extract organism names from SRA accession IDs?

0

Entering edit mode

5.6 years ago

nikthoma • 0

I have thousands of SRA short reads on a server labeled by their accession IDs only. I need an easy way to get their taxa without searching each ID individually on the SRA database. Does anyone have a way of doing this?

SRA taxa NCBI • 1.7k views

ADD COMMENT • link 5.6 years ago by nikthoma • 0

0

Entering edit mode

Thank you. I implemented your Entrez Direct approach and it worked fantastically.

ADD REPLY • link 5.6 years ago by nikthoma • 0

0

Entering edit mode

enter image description here

ADD REPLY • link 5.6 years ago by ATpoint 88k

0

Entering edit mode

Thanks! This is my first time asking a question on BioStars. It wasn't immediately obvious to do that in my browser. Sorry.

ADD REPLY • link 5.6 years ago by nikthoma • 0

score 7 · Accepted Answer · 2019-12-11

Browser method

Navigate to the NCBI SRA portal and enter the query. Click on the Send To link at the top right corner of the results table and download the results table to a file in 'RunInfo' format as shown in the image below: enter image description here

You can open this comma-delimited file with Excel or any other spreadsheet program. This table has both the TaxID and ScientificName columns.

Command-line method

You can use Entrez Direct for this. If you pipe the first esearch command to efetch -format runinfo, you will get the comma-delimited runinfo table that has the TaxID and ScientificName columns. Alternatively, you can extract only a select set of fields as shown below:

$ esearch -db sra -query 'SRP014739' \
  | esummary \
  | xtract -pattern DocumentSummary -element Study@acc,Sample@acc,Experiment@acc,Run@acc,Organism@taxid,Organism@ScientificName
SRP014739       SRS353575       SRX174474       SRR534566       9606    Homo sapiens
SRP014739       SRS353574       SRX174473       SRR534565       9606    Homo sapiens
SRP014739       SRS353573       SRX174472       SRR534564       9606    Homo sapiens
SRP014739       SRS353572       SRX174471       SRR534563       9606    Homo sapiens