I’m working with a list of approximately 3,000 human genes and want to identify their orthologs in C. elegans. While I understand that the distinction between homologs and orthologs isn't always straightforward, for my purposes, I'm focused on orthologs or closely related genes.
I aim to use this in a publication, so I want to rely on databases or resources backed by reputable organizations and widely cited in the field.
So far, I’ve explored EggNOG6 (http://eggnog6.embl.de/#/app/home). I downloaded the e6.og2seqs_and_species.tsv
file (description here: https://github.com/eggnogdb/eggnog_docs/wiki/Description-of-download-files), which organizes species into orthologous groups (OGs). However, the relationships between human and C. elegans genes often appear as many-to-many, making it challenging to pinpoint precise matches.
Does anyone have recommendations for the best approach or alternative resources to achieve this? Any insights or advice would be greatly appreciated!
Regarding the issue of "one-to-many" relationships in the EggNOG results:
Humans and C. elegans share a common ancestor that lived roughly 700 MYA (at least according to timetree.org). Due to gene duplication events in both lineages since this time, it is basically inevitable that some of the genes shared by common ancestry will have complex relationships that don't follow simple one-to-one relationships.
So if you use one of the suggestions provided by others, don't be surprised if you continue to see a lot of one-to-many or many-to-many relationships in the results. This just stems from how evolution works, and is in some sense unavoidable.
Of course, you can focus your analysis on only the genes that have maintained a one-to-one relationship between the two species, but that could end up being a relatively small subset of your data.
I can only double this! As an alternative, you might want to compare the list with other tools. In the OMA Browser, we provide for example a genome-pair view: https://omabrowser.org/oma/genomePW/
Thank you for your help!!
Have you checked: http://www.greenwaldlab.org/ortholist/
Thank you for your help!!
You can construct your own database by doing a reciprocal best hit search. Use something like
MMseqs2's rbh
module with both sets of sequences as inputs.You'll probably still end up with some one-to-many (e.g., due to in-paralogs) situations that you'll have to resolve.
Orthologs are (one type of) homologs.