Hello all,
Looking for any suggestions on the currently accepted methodology for isolating orthologous proteins from multiple datasets. We are working with eukaryotes who are non-model organisms. Our datasets are in proteins assembled using transdecoder and we have done our best to eliminate redundant sequences. I am somewhat familiar with Hamstr, Orthofinder, OrthoDB, etc. but am not super confident as to which method would be best. Our goal is to rule out paralogous genes and construct a phylogenetic tree. We then want to explore certain genes of interest that are shared between the different species. Any links to good reviews would also be appreciated.
Best,
A.B.
Hi,
Can you describe what you done to eliminate redundant sequences? how did you obtain your proteins, from genome or transcriptome? If you have proteins from genome and transcriptome derived, I can suggest that you can first get orthologs of genome-derived protein data set, and later you can use those orthologs to find in transcriptome-derived proteins. If you use both genome and transcriptome-derived proteins together in orthologs analysis, you may not get enough number (>50) orthologs proteins (if you have more than 10 species data).
In addition to tools you mentioned you can use OMA tool, but OMA requires much storage area and takes longer than other tools.