Question

Command line Taxa search within SRA / NCBI Trace

1

Entering edit mode

4.3 years ago

poppersrules ▴ 10

Is there a command line operation to get trace data from a SRA? For instance, with the link below I can click on "analysis" and get the following taxonomic breakdown of the reads. Any way to get actual trace data via command line and outputted as ASCI text?

https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR014739

Unidentified reads: 25.14%
Identified reads: 74.86%
cellular organisms: 74.82%
Eukaryota: 74.77%
Simiiformes: 71.4%
Catarrhini: 64.79%
Hominoidea: 54.22%
Hominidae: 45.46%
Homininae: 38.09%
Homo sapiens: 14.98%
Bacteria: 0.04%
Viruses: 0.04%

Edit: I'm looking for actual trace data, not the human submitted biospecies project data.

SRA Taxa Taxonomic taxonomy • 2.1k views

ADD COMMENT • link updated 3.4 years ago by Hadley • 0 • written 4.3 years ago by poppersrules ▴ 10

0

Entering edit mode

I'm looking for actual trace data,

What do you mean by actual trace data? This is a NGS dataset there are no traces like Sanger reads.

not the human submitted biospecies project data.

The taxonomic breakdown you posted above is generated by NCBI using the program mentioned by @vkkodali below. That is not from submitters.

vkkodali can correct me but I doubt NCBI stores the read classification to allow for selective retrieval.

ADD REPLY • link 4.3 years ago by GenoMax 151k

0

Entering edit mode

I don't entirely follow. Do you want to know the actual reads instead of just percentage values for each taxonomic group? NCBI does not store that data. May be if you run STAT on your own, there is an option to classify and extract reads in that manner but I am not sure how to do that.

ADD REPLY • link 4.3 years ago by vkkodali_ncbi ★ 3.8k

score 2 · Answer 1 · 2021-03-08

2

Entering edit mode

4.3 years ago

vkkodali_ncbi ★ 3.8k

These data are generated by a tool called STAT. You can download it here: https://github.com/ncbi/ngs-tools/tree/tax/tools/tax

If you scroll down to the bottom of the "Analysis" tab (for example, https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR014739) you will see a link named "How taxonomy analysis is done?" that will provide a few more details.

Precomputed data are available on the cloud (Google BigQuery and Amazon Athena). Some documentation is available here: https://www.ncbi.nlm.nih.gov/sra/docs/sra-cloud-based-examples/

Finally, a url of the format https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR014739&retmode=xml can be constructed to fetch data in XML format.

ADD COMMENT • link 4.3 years ago by vkkodali_ncbi ★ 3.8k

0

Entering edit mode

Thank you for this information. Do you have any recommendation on how how to parse that table like information from the XML file?

ADD REPLY • link 4.3 years ago by GenoMax 151k

0

Entering edit mode

you can use any XML parser. I use https://docs.python.org/3/library/xml.etree.elementtree.html with python