Command line Taxa search within SRA / NCBI Trace
1
1
Entering edit mode
3.7 years ago
poppersrules ▴ 10

Is there a command line operation to get trace data from a SRA? For instance, with the link below I can click on "analysis" and get the following taxonomic breakdown of the reads. Any way to get actual trace data via command line and outputted as ASCI text?

https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR014739

Unidentified reads: 25.14%
Identified reads: 74.86%
cellular organisms: 74.82%
Eukaryota: 74.77%
Simiiformes: 71.4%
Catarrhini: 64.79%
Hominoidea: 54.22%
Hominidae: 45.46%
Homininae: 38.09%
Homo sapiens: 14.98%
Bacteria: 0.04%
Viruses: 0.04%

Edit: I'm looking for actual trace data, not the human submitted biospecies project data.

SRA Taxa Taxonomic taxonomy • 1.8k views
ADD COMMENT
0
Entering edit mode

I'm looking for actual trace data,

What do you mean by actual trace data? This is a NGS dataset there are no traces like Sanger reads.

not the human submitted biospecies project data.

The taxonomic breakdown you posted above is generated by NCBI using the program mentioned by @vkkodali below. That is not from submitters.

vkkodali can correct me but I doubt NCBI stores the read classification to allow for selective retrieval.

ADD REPLY
0
Entering edit mode

I don't entirely follow. Do you want to know the actual reads instead of just percentage values for each taxonomic group? NCBI does not store that data. May be if you run STAT on your own, there is an option to classify and extract reads in that manner but I am not sure how to do that.

ADD REPLY
2
Entering edit mode
3.7 years ago
vkkodali_ncbi ★ 3.8k

These data are generated by a tool called STAT. You can download it here: https://github.com/ncbi/ngs-tools/tree/tax/tools/tax

If you scroll down to the bottom of the "Analysis" tab (for example, https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR014739) you will see a link named "How taxonomy analysis is done?" that will provide a few more details.

Precomputed data are available on the cloud (Google BigQuery and Amazon Athena). Some documentation is available here: https://www.ncbi.nlm.nih.gov/sra/docs/sra-cloud-based-examples/

Finally, a url of the format https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR014739&retmode=xml can be constructed to fetch data in XML format.

ADD COMMENT
0
Entering edit mode

Thank you for this information. Do you have any recommendation on how how to parse that table like information from the XML file?

ADD REPLY
0
Entering edit mode

you can use any XML parser. I use https://docs.python.org/3/library/xml.etree.elementtree.html with python

ADD REPLY
0
Entering edit mode

I've been looking for this solution for way too long! thank you!

ADD REPLY

Login before adding your answer.

Traffic: 1610 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6