I need to find the kingdom for some taxon IDs. I am already using the following Entrez command to get the full taxonomy for a specific taxon ID.
esearch -db taxonomy -query "4932 [taxID]" | efetch -format nativemode xml | xtract -pattern Taxon -block "*/Taxon" -unless Rank -equals "no rank" -tab "\n" -element Rank,ScientificName
First I thought, I may be able to subset the kingdom from the taxonomy I get. But as I understood, there is no specific order according to which I can get the kingdom. for example the following two taxon IDS have two different orders, and even the first one does not include the kingdom:
Eukaryota; Metamonada; Diplomonadida; Hexamitidae; Giardiinae; Giardia (superkingdom --> no rank --> order --> family --> subfamily --> genus)
cellular organisms; Eukaryota; Opisthokonta; Fungi; Dikarya; Ascomycota; saccharomyceta; Saccharomycotina; Saccharomycetes; Saccharomycetales; Saccharomycetaceae; Saccharomyces (no rank --> super kingdom --> no rank --> kingdom --> subkingdom --> phylum --> no rank --> ....)
1.What does "no rank" mean in the lineage?
2.Is there any specific order according to which I can subset the kingdom?
3.Is there a way to find the kingdom for a taxon id directly using the terminal?
Thank you in advance :)
4751
is the taxID for the kingdom.Thank you very much, May I ask how I can change this command to for example get the phylum?
change word
kingdom
tophylum
.Interestingly, this command does not work for the ones which don't have the kingdom in their lineage. my intention is to get the kingdom for as many as taxon ID possible, and kingdom is not there, I will get either phylum or anything that is found after kingdom. this is an example: esearch -db taxonomy -query "5741 [taxID]" | efetch -format native -mode xml | xtract -pattern TaxaSet -group LineageEx -block Taxon -if Rank -equals “phylum” -subset Taxon -tab '\n' -def NA -element TaxId,Rank,ScientificName
That was my intention; not sure what @Freddy wants though.
When I run
esearch -db taxonomy -query "5741 [taxID]" | efetch -format native -mode xml | xtract -pattern TaxaSet -group LineageEx -block Taxon -if Rank -equals “phylum” -subset Taxon -tab '\n' -def NA -element TaxId,Rank,ScientificName
it correctly returns the phylum Fornicata. Is that not what you expect? You can get both kingdom and phylum information using this:esearch -db taxonomy -query "4932 [taxID]" | efetch -format native -mode xml | xtract -pattern TaxaSet -group LineageEx -block Taxon -if Rank -equals "kingdom" -or Rank -equals "phylum" -subset Taxon -tab '\n' -def NA -element TaxId,Rank,ScientificName
. Note, thextract
command can get quite unwieldy once you want to do all sorts of fancy things like conditionals and loops and such. If that becomes the case, you may want to consider using xml parsers in python or perl.I think @Freddy is saying that the kingdom and phylum information is not available for some taxID and you don't know that until you run the command. They are probably looking for a one click solution which will go on the phylum if kingdom is not found etc.
I agree with you on using a proper parser. Perhaps it may be more efficient to download the taxonomy database and parse that rather than using Entrezdirect (much as I like it).
I have trying to search for the taxonomy division of multiple taxids using the following in R: taxid <- c(7070, 5741, 658858) taxid.as.string <- paste(taxid,collapse=", ")
when I run this, the order of the result does not match the order of taxid specified in the query. I was thinking the maybe, I can also mention the desired taxid in the result so I will get sth like:
"5741:superkingdom\tEukaryota\tphylum\tFornicata\torder\tDiplomonadida\tfamily\tHexamitidae\tsubfamily\tGiardiinae\tgenus\tGiardia\tspecies\tGiardia intestinalis"
"7070:superkingdom\tEukaryota\tkingdom\tMetazoa\tphylum\tArthropoda\tsubphylum\tHexapoda\tclass\tInsecta\tsubclass\tPterygota\tinfraclass\tNeoptera\tcohort\tHolometabola\torder\tColeoptera\tsuborder\tPolyphaga\tinfraorder\tCucujiformia\tsuperfamily\tTenebrionoidea\tfamily\tTenebrionidae\tgenus\tTribolium"
"658858:superkingdom\tEukaryota\tphylum\tFornicata\torder\tDiplomonadida\tfamily\tHexamitidae\tsubfamily\tGiardiinae\tgenus\tGiardia"
I already tried adding TaxId to -element, but this returns the taxid for every element in the division of the taxonomy. Do you have any suggestions for me?
My R is a bit rusty but here's the command you would use on the Unix shell. When you have a list of unique identifiers (uids) such as the taxids, you don't have to go through an
esearch
step. You can useepost
instead as shown below:The magic parameter was
-first
to restrict the output to have only the first TaxId. You can change the-sep
parameter to something else; I figured:
is appropriate here. If you want to change the separator between the divisions, you should use-tab
parameter.Thank you so much! I have been searching for an answer quite a while, but haven't managed to find anything. It tried your solution and it works perfectly, thank you so much.
May I ask you where you have learned details like this?