Hello everyone,
I have a question regarding these two terms. I understand that eggNOG is a phylogeny-based ortholog protein finder, while Diamond is a tool used to search for homology between sequences. As you know, eggNOG provides also taxonomy information for proteins, albeit not at very low taxonomic ranks. My goal is to achieve consistency between Diamond and eggNOG results.
For instance, if Diamond identifies 100 eukaryotic proteins, eggNOG might identify more, say 200. However, some proteins assigned as eukaryotic by Diamond may not be classified as eukaryotic orthologs in eggNOG. This discrepancy also occurs in cases involving bacteria. For instance, Diamond might classify a protein as bacterial, while eggNOG identifies it as a eukaryotic ortholog.
While I understand that orthology can be established between species across different genera, families, and phyla, I'm questioning the reliability of classification at the kingdom level. Horizontal gene transfer (HGT) is a possibility, but I prefer not to make this assumption based solely on my data.
My question is: which source of taxonomic information might be more reliable for the protein taxonomy? In other words, if eggNOG identifies a protein as eukaryotic, should it belong to certain eukaryotic lineages?
Thank you.
eggNOG
is a database,eggnog-mapper
is the tool that is used to construct theeggNOG
database (if I recall correctly).eggnog-mapper
calls upon fast aligners such asDiamond
internally to (first) establish homology between sequences and (then) identify/classify orthologs through additional analyses. You are unlikely to establish parity in results betweenDiamond
andeggNOG
outputs as a result of this consideration.It would but this need not mean functional counterparts to it cannot be found elsewhere in the tree of life if that's what you're getting at.
Sorry for the confusion earlier. As you mentioned,
eggnog-mapper
is a tool used with theeggNOG
database, and you can specify the search parameter as eitherhmmer
ordiamond
. In my case, I used the diamond search option ineggnog-mapper
against theeggNOG
database. Besides, I only useddiamond blastp
againstnr
database and now I comparediamond blastp
andeggNOG result
. Given this scenario, which taxonomy classification from the protein results would be more reliable? Or If I use simply eggNOG taxonomy to talk about the protein taxonomy, would that be correct?Regarding your statement,
Yes, that's generally correct. However, regarding the protein I have, it belongs to eukaryotes according to eggNOG taxonomy, doesn't it? I actually want to be sure If I can use ortholog taxonomy information to assign the protein' taxonomy.
The
eggNOG
results are bound to be more reliable in the comparison you mentioned becauseeggnog-mapper
conducts additional analyses to infer the type of homology established between the sequences by the aligner as indicated in Fig. 1 (C) here ( https://doi.org/10.1093/molbev/msab293 ).The
eggNOG
taxonomy here is "protein" (or at least amino acid sequence based) taxonomy here, so yes.You are, I am guessing, essentially operating in a situation wherein you are attempting to establish affiliation and annotation for a sequence on the basis of the "best" knowledge and tooling available to you. If
eggNOG
indicates that your sequence falls within a family found (hitherto) only in eukaryotes and the host species of this sequence is also an eukaryote, I do not really see any reason to suspect mis-affiliation and/or mis-annotation here.If you have no other information available to you (which I guess is the case here), what else can you do here?