Protein Sequence (Only 1 Representative Per Family)
2
1
Entering edit mode
11.6 years ago
GreenDiamond ▴ 70

What would be a painless way to get a collection (100 or 1000 etc) of amino acid sequences of proteins with only one protein per family? I am terrible at navigating databases for this, so if anyone has a step-by-step solution (or even better, knows of such a list that is already made that I could download), I would be so, so, grateful!! :)

Thank you

protein bioinformatician protein-structure • 2.5k views
ADD COMMENT
0
Entering edit mode

What do you mean by family? A sequence-based classification (e.g. PFAM) or structure-based (e.g. SCOP)?

ADD REPLY
0
Entering edit mode

Wow, I did not see your comment until now! I did not even realize there were different classifications. I just do not want to use proteins that may be functionally similar or gene duplicates. I am using HMM on the data, and want to mitigate any dependency. Any idea? And thanks!

ADD REPLY
1
Entering edit mode
11.6 years ago

Sorry for shameless plug :) - I have worked on characterizing best-respresentative sequence from protein families.

Please see: 3PFDB database - a database of best-representative PSSMs (and sequences) derived from protein families using an . Manuscript is available here.

Also check this short conference report on gathering best representative sequences.

Please let me know if you specifically need any datadumps or any additional infomraiton.

ADD COMMENT
0
Entering edit mode

Does your method take into account Pfam Clans (which group together related families?). In my quick scan of your paper, I couldn't see that you'd done that..

ADD REPLY
0
Entering edit mode
11.6 years ago
Rm 8.3k

I remember, I did this 7 yrs ago: Go to pfam, look for Pfam-A seed sequences used to buld the pfam family profile and select one with longest length.

ADD COMMENT
0
Entering edit mode

Rm thanks for your reply! I do not have any specific protein, though, I would like to generate a list of proteins (any kind of proteins) so long as they are not from the same family. Is what you are saying mean I would need to search for a specific protein? Because I am not searching for a representative from any specific protein(s). Do you see what I mean?

ADD REPLY

Login before adding your answer.

Traffic: 2217 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6