Programmatically retrieving taxon classification
3
0
Entering edit mode
3.9 years ago
schlogl ▴ 160

Hi there, hope everyone healthy and save. Do you guys know some API or another way I can retrieve taxon classification from https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=28048&lvl=3&lin=f&keep=1&srchmode=1&unlock?

I have a list of name as:

Acidiphilium
Acidipropionibacterium
Acidithiobacillus
Acidobacterium
Acidovorax
Acinetobacter
Actinoalloteichus
Actinobacillus
Actinomadura
Actinomyces

And I would like to have something like this as return:

Bacteria; Terrabacteria group; Actinobacteria; Actinobacteria; Acidothermales; Acidothermaceae

Thanks to your time.

Paulo

PS - I don't wanna do it manually.

sequence • 1.1k views
ADD COMMENT
2
Entering edit mode
3.9 years ago

Have a look at https://github.com/shenwei356/taxonkit , I think it's exactly what you need

ADD COMMENT
2
Entering edit mode
3.9 years ago
GenoMax 148k

Using EntrezDirect:

Your list in file id. One per line.

$ for i in `cat id`; do printf ${i}"\n"; esearch -db taxonomy -query ${i} | efetch -format native -mode xml | grep ScientificName | awk -F ">|<" 'BEGIN{ORS=", ";}{print $3;}'; printf "\n"; done
Acidiphilium
Acidiphilium, cellular organisms, Bacteria, Proteobacteria, Alphaproteobacteria, Rhodospirillales, Acetobacteraceae,
Acidipropionibacterium
Acidipropionibacterium, cellular organisms, Bacteria, Terrabacteria group, Actinobacteria, Actinobacteria, Propionibacteriales, Propionibacteriaceae,
Acidithiobacillus
Acidithiobacillus, cellular organisms, Bacteria, Proteobacteria, Acidithiobacillia, Acidithiobacillales, Acidithiobacillaceae,
Acidobacterium
Acidobacterium, cellular organisms, Bacteria, Acidobacteria, Acidobacteriia, Acidobacteriales, Acidobacteriaceae,
Acidovorax
Acidovorax, cellular organisms, Bacteria, Proteobacteria, Betaproteobacteria, Burkholderiales, Comamonadaceae,
Acinetobacter
Acinetobacter, cellular organisms, Bacteria, Proteobacteria, Gammaproteobacteria, Pseudomonadales, Moraxellaceae,
Actinoalloteichus
Actinoalloteichus, cellular organisms, Bacteria, Terrabacteria group, Actinobacteria, Actinobacteria, Pseudonocardiales, Pseudonocardiaceae,
Actinobacillus
Actinobacillus, cellular organisms, Bacteria, Proteobacteria, Gammaproteobacteria, Pasteurellales, Pasteurellaceae,
Actinomadura
Actinomadura, cellular organisms, Bacteria, Terrabacteria group, Actinobacteria, Actinobacteria, Streptosporangiales, Thermomonosporaceae,
ADD COMMENT
0
Entering edit mode

Thanks Genomax. Awesome.

ADD REPLY
2
Entering edit mode
3.9 years ago
hugo.avila ▴ 530

This should do the trick.

I did use this other answer, added a loop and a little string format. The file your_list.txt contains your list of names.

Here it go:

cat your_list.txt | 
    xargs -I {} sh -c "esearch -db taxonomy -query '{}' | efetch -db taxonomy -format docsum | xtract -pattern DocumentSummary  -element TaxId | head -1" | 
    xargs -I {} sh -c "esearch -db taxonomy  -query \"{}[TaxId]\" | 
        efetch -format native -mode xml | 
        grep ScientificName | grep -Po '(?<=\>).+(?=\<)' | tr '\n' ';' | sed -r 's/cellular organisms;//;s/;$/\n/'"

E um salve pra vc ;)

ADD COMMENT
0
Entering edit mode

Valeu brother. Bom ver uns Brasileiros por aqui. Me manda teu contato. schlogl@hotmail.com Thanks

ADD REPLY

Login before adding your answer.

Traffic: 2172 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6