eggNOG versus Diamond blastp
0
0
Entering edit mode
9 months ago

Hello everyone,

I have a question regarding these two terms. I understand that eggNOG is a phylogeny-based ortholog protein finder, while Diamond is a tool used to search for homology between sequences. As you know, eggNOG provides also taxonomy information for proteins, albeit not at very low taxonomic ranks. My goal is to achieve consistency between Diamond and eggNOG results.

For instance, if Diamond identifies 100 eukaryotic proteins, eggNOG might identify more, say 200. However, some proteins assigned as eukaryotic by Diamond may not be classified as eukaryotic orthologs in eggNOG. This discrepancy also occurs in cases involving bacteria. For instance, Diamond might classify a protein as bacterial, while eggNOG identifies it as a eukaryotic ortholog.

While I understand that orthology can be established between species across different genera, families, and phyla, I'm questioning the reliability of classification at the kingdom level. Horizontal gene transfer (HGT) is a possibility, but I prefer not to make this assumption based solely on my data.

My question is: which source of taxonomic information might be more reliable for the protein taxonomy? In other words, if eggNOG identifies a protein as eukaryotic, should it belong to certain eukaryotic lineages?

Thank you.

diamond eggNOG protein functional • 807 views
ADD COMMENT
1
Entering edit mode

eggNOG is a database, eggnog-mapper is the tool that is used to construct the eggNOG database (if I recall correctly). eggnog-mapper calls upon fast aligners such as Diamond internally to (first) establish homology between sequences and (then) identify/classify orthologs through additional analyses. You are unlikely to establish parity in results between Diamond and eggNOG outputs as a result of this consideration.

In other words, if eggNOG identifies a protein as eukaryotic, should it belong to certain eukaryotic lineages?

It would but this need not mean functional counterparts to it cannot be found elsewhere in the tree of life if that's what you're getting at.

ADD REPLY
0
Entering edit mode

Sorry for the confusion earlier. As you mentioned, eggnog-mapper is a tool used with the eggNOG database, and you can specify the search parameter as either hmmer or diamond. In my case, I used the diamond search option in eggnog-mapper against the eggNOG database. Besides, I only used diamond blastp against nr database and now I compare diamond blastp and eggNOG result. Given this scenario, which taxonomy classification from the protein results would be more reliable? Or If I use simply eggNOG taxonomy to talk about the protein taxonomy, would that be correct?

Regarding your statement,

"It would but this need not mean functional counterparts to it cannot be found elsewhere in the tree of life if that's what you're getting at."

Yes, that's generally correct. However, regarding the protein I have, it belongs to eukaryotes according to eggNOG taxonomy, doesn't it? I actually want to be sure If I can use ortholog taxonomy information to assign the protein' taxonomy.

ADD REPLY
1
Entering edit mode

The eggNOG results are bound to be more reliable in the comparison you mentioned because eggnog-mapper conducts additional analyses to infer the type of homology established between the sequences by the aligner as indicated in Fig. 1 (C) here ( https://doi.org/10.1093/molbev/msab293 ).

Or If I use simply eggNOG taxonomy to talk about the protein taxonomy, would that be correct?

The eggNOG taxonomy here is "protein" (or at least amino acid sequence based) taxonomy here, so yes.

However, regarding the protein I have, it belongs to eukaryotes according to eggNOG taxonomy, doesn't it?

You are, I am guessing, essentially operating in a situation wherein you are attempting to establish affiliation and annotation for a sequence on the basis of the "best" knowledge and tooling available to you. If eggNOG indicates that your sequence falls within a family found (hitherto) only in eukaryotes and the host species of this sequence is also an eukaryote, I do not really see any reason to suspect mis-affiliation and/or mis-annotation here.

I actually want to be sure If I can use ortholog taxonomy information to assign the protein' taxonomy.

If you have no other information available to you (which I guess is the case here), what else can you do here?

ADD REPLY

Login before adding your answer.

Traffic: 2914 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6