I have a set of proteins and I need to search homologous partners for them. I wanted to automate the process of searching and I wrote a perl script for that. Now, the question is what should be the % of identity (both min and max) should I use in searching the homologous sequences. Also, the database that I using for blastp is PDB , since I need the structures only .. Kindly help out with this problem ??
My query is that I want to obtain homologous protein structures for my dataset ?? . Since the dataset is large I need some sort of cutoff values to identify the homologous sequences
I recalls that 30% is an empirical cutoff in term of protein sequence similarity.
If you use BLAST, then E-value serves as a better indicator of homology, comparing to identity. Because E-value takes into account the lengths of query and subject sequences. For example, a short protein is more likely to be somehow similar to another random guy simply by chance, in which case, a high E-value speaks stronger than a high identity. As I know there's no standard cutoff for E-value. You can try from 1e-2 (IMG's protocol) to 1e-10 or even lower.
On the other hand, there are a bunch of programs to help you identify orthologs, using more sophisticated algorithms, such as OrthoMCL. You can try those...
Please take care with the word "homologous". It has a very specific meaning which you should understand.
The PID that you choose varies for different purposes. Give us more details about your problem for answering your question.
My query is that I want to obtain homologous protein structures for my dataset ?? . Since the dataset is large I need some sort of cutoff values to identify the homologous sequences
You can use Dali to make the identity scores from structural alignments.