Hi everyone
I have many protein sequences from new species. The data for these species is more complete than those collected in Genbank db. I would like to group orthologous proteins into clusters. Which tools can be applied for this work? I very appreciate your advice. Thank you greatly!
This may not be what you're looking for, but I like to do rough sequence clustering on my protein sets using CD-hit. It's very fast and can give you an idea of your protein demographics. This is what UniProt uses to cluster.
Thank you for your comment that is very helpful.