What would be a painless way to get a collection (100 or 1000 etc) of amino acid sequences of proteins with only one protein per family? I am terrible at navigating databases for this, so if anyone has a step-by-step solution (or even better, knows of such a list that is already made that I could download), I would be so, so, grateful!! :)
Thank you
What do you mean by family? A sequence-based classification (e.g. PFAM) or structure-based (e.g. SCOP)?
Wow, I did not see your comment until now! I did not even realize there were different classifications. I just do not want to use proteins that may be functionally similar or gene duplicates. I am using HMM on the data, and want to mitigate any dependency. Any idea? And thanks!