Hello!
I'm working on the following problem. I are looking for enzymes of function X to clone into organisms. Currently, I use BLAST and then select several proteins that are similar and from distinct species and then order them, clone them, and test the organisms ability to do X.
I'm looking for a way to improve this process, potentially through a better clustering process so that I can find similar groups of enzymes and then test one from each group in the wet lab. There are a range of tools for predicting protein function from sequence / structure, but the goal of this is distinct.
We start with a protein and want to have a better sense of its related proteins than a simple BLAST search provides. Right now I'm considering: aligning all the sequences and creating a distance matrix based on sequence similarity (Needleman&Wunsch), or potentially Marie Chabbert's Bios2mds. Any advice or obvious tools I may have missed?