how to find the kingdom for a taxon id using the terminal?
2
0
Entering edit mode
4.7 years ago
Frieda ▴ 60

I need to find the kingdom for some taxon IDs. I am already using the following Entrez command to get the full taxonomy for a specific taxon ID.

esearch -db taxonomy -query "4932 [taxID]" | efetch -format nativemode xml | xtract -pattern Taxon -block "*/Taxon" -unless Rank -equals "no rank" -tab "\n" -element Rank,ScientificName

First I thought, I may be able to subset the kingdom from the taxonomy I get. But as I understood, there is no specific order according to which I can get the kingdom. for example the following two taxon IDS have two different orders, and even the first one does not include the kingdom:

Eukaryota; Metamonada; Diplomonadida; Hexamitidae; Giardiinae; Giardia (superkingdom --> no rank --> order --> family --> subfamily --> genus)

cellular organisms; Eukaryota; Opisthokonta; Fungi; Dikarya; Ascomycota; saccharomyceta; Saccharomycotina; Saccharomycetes; Saccharomycetales; Saccharomycetaceae; Saccharomyces (no rank --> super kingdom --> no rank --> kingdom --> subkingdom --> phylum --> no rank --> ....)

1.What does "no rank" mean in the lineage?

2.Is there any specific order according to which I can subset the kingdom?

3.Is there a way to find the kingdom for a taxon id directly using the terminal?

Thank you in advance :)

ncbi taxonomy kingdom terminal • 2.0k views
ADD COMMENT
2
Entering edit mode
4.7 years ago
vkkodali_ncbi ★ 3.8k

You can get just the kingdom as follows:

$ esearch -db taxonomy -query "4932 [taxID]" \
    | efetch -format native -mode xml \
    | xtract -pattern TaxaSet \
      -group LineageEx \
      -block Taxon \
      -if Rank -equals "kingdom" \
      -subset Taxon -tab '\n' -def NA \
      -element TaxId,Rank,ScientificName

4751    kingdom    Fungi
ADD COMMENT
0
Entering edit mode

4751 kingdom Fungi

4751 is the taxID for the kingdom.

ADD REPLY
0
Entering edit mode

Thank you very much, May I ask how I can change this command to for example get the phylum?

ADD REPLY
1
Entering edit mode

change word kingdom to phylum.

ADD REPLY
0
Entering edit mode

Interestingly, this command does not work for the ones which don't have the kingdom in their lineage. my intention is to get the kingdom for as many as taxon ID possible, and kingdom is not there, I will get either phylum or anything that is found after kingdom. this is an example: esearch -db taxonomy -query "5741 [taxID]" | efetch -format native -mode xml | xtract -pattern TaxaSet -group LineageEx -block Taxon -if Rank -equals “phylum” -subset Taxon -tab '\n' -def NA -element TaxId,Rank,ScientificName

ADD REPLY
1
Entering edit mode

4751 is the taxID for the kingdom.

That was my intention; not sure what @Freddy wants though.

When I run esearch -db taxonomy -query "5741 [taxID]" | efetch -format native -mode xml | xtract -pattern TaxaSet -group LineageEx -block Taxon -if Rank -equals “phylum” -subset Taxon -tab '\n' -def NA -element TaxId,Rank,ScientificName it correctly returns the phylum Fornicata. Is that not what you expect? You can get both kingdom and phylum information using this: esearch -db taxonomy -query "4932 [taxID]" | efetch -format native -mode xml | xtract -pattern TaxaSet -group LineageEx -block Taxon -if Rank -equals "kingdom" -or Rank -equals "phylum" -subset Taxon -tab '\n' -def NA -element TaxId,Rank,ScientificName. Note, the xtract command can get quite unwieldy once you want to do all sorts of fancy things like conditionals and loops and such. If that becomes the case, you may want to consider using xml parsers in python or perl.

ADD REPLY
1
Entering edit mode

I think @Freddy is saying that the kingdom and phylum information is not available for some taxID and you don't know that until you run the command. They are probably looking for a one click solution which will go on the phylum if kingdom is not found etc.

I agree with you on using a proper parser. Perhaps it may be more efficient to download the taxonomy database and parse that rather than using Entrezdirect (much as I like it).

ADD REPLY
0
Entering edit mode

I have trying to search for the taxonomy division of multiple taxids using the following in R: taxid <- c(7070, 5741, 658858) taxid.as.string <- paste(taxid,collapse=", ")

batch <- system(paste0("esearch -db taxonomy -query \"",taxid.as.string, " [taxID]\" | efetch -format xml | xtract -pattern Taxon -block \"*/Taxon\" -unless Rank -equals \"no rank\" -tab \"\t\" -element Rank,ScientificName,TaxId "), intern = TRUE)

when I run this, the order of the result does not match the order of taxid specified in the query. I was thinking the maybe, I can also mention the desired taxid in the result so I will get sth like:

"5741:superkingdom\tEukaryota\tphylum\tFornicata\torder\tDiplomonadida\tfamily\tHexamitidae\tsubfamily\tGiardiinae\tgenus\tGiardia\tspecies\tGiardia intestinalis"

"7070:superkingdom\tEukaryota\tkingdom\tMetazoa\tphylum\tArthropoda\tsubphylum\tHexapoda\tclass\tInsecta\tsubclass\tPterygota\tinfraclass\tNeoptera\tcohort\tHolometabola\torder\tColeoptera\tsuborder\tPolyphaga\tinfraorder\tCucujiformia\tsuperfamily\tTenebrionoidea\tfamily\tTenebrionidae\tgenus\tTribolium"

"658858:superkingdom\tEukaryota\tphylum\tFornicata\torder\tDiplomonadida\tfamily\tHexamitidae\tsubfamily\tGiardiinae\tgenus\tGiardia"

I already tried adding TaxId to -element, but this returns the taxid for every element in the division of the taxonomy. Do you have any suggestions for me?

ADD REPLY
2
Entering edit mode

My R is a bit rusty but here's the command you would use on the Unix shell. When you have a list of unique identifiers (uids) such as the taxids, you don't have to go through an esearch step. You can use epost instead as shown below:

## create a file with uids, one per line
cat taxids.txt
7070
5741
658858

## use epost 
epost -db taxonomy -input taxids.txt \
| efetch -format xml \
| xtract -pattern Taxon \
    -first TaxId \
    -block "LineageEx/Taxon" \
    -unless Rank -equals "no rank" \
    -sep ':' \
    -element Rank,ScientificName

658858  superkingdom:Eukaryota  phylum:Fornicata        order:Diplomonadida     family:Hexamitidae      subfamily:Giardiinae    genus:Giardia   species:Giardia intestinalis
7070    superkingdom:Eukaryota  kingdom:Metazoa phylum:Arthropoda       subphylum:Hexapoda      class:Insecta   subclass:Pterygota      infraclass:Neoptera     cohort:Holometabola     order:Coleoptera        suborder:Polyphaga      infraorder:Cucujiformia superfamily:Tenebrionoidea      family:Tenebrionidae    genus:Tribolium
5741    superkingdom:Eukaryota  phylum:Fornicata        order:Diplomonadida     family:Hexamitidae      subfamily:Giardiinae    genus:Giardia

The magic parameter was -first to restrict the output to have only the first TaxId. You can change the -sep parameter to something else; I figured : is appropriate here. If you want to change the separator between the divisions, you should use -tab parameter.

ADD REPLY
0
Entering edit mode

Thank you so much! I have been searching for an answer quite a while, but haven't managed to find anything. It tried your solution and it works perfectly, thank you so much.

May I ask you where you have learned details like this?

ADD REPLY
0
Entering edit mode
4.7 years ago
GenoMax 147k

@vkkodali has an elegant solution but you could do something like:

$ esearch -db taxonomy -query "9606 [taxID]" | efetch -format native -mode xml | grep -B1 -w "kingdom"
            <ScientificName>Metazoa</ScientificName>
            <Rank>kingdom</Rank>


No rank question has been answered in this Biology SE thread.

ADD COMMENT

Login before adding your answer.

Traffic: 1696 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6