I am new to protein structure prediction. A protein structure informatics expert at my institution advised me to first check quality of structure predictions, before any downstream use. My 2-step pipeline is: Step 1. Structure prediction with LOMETS or I-TASSER Step 2. Structure evaluation with ProQ or QMEAN
I am most interested in the F-box domain. From PDB-RCSB database, crystal structure is known for > 10 proteins that contain this F-box domain.
As a practice run, I predicted structure for 2 F-box domain sequences. Those sequences are: One sequence from PF00646 seed alignment (LALTKLPPELLVQVLSHVPPRALVTRCRPVCRAWRDLVDGPSIWLLQLA) Another sequence from 1FQV-A of PDB-RCSB (VSWDSLPDELLLGIFSCLCLPELLKVSGVCKRWYRLASDESLWQTLD) Based on the "source" of these sequences, they must be bonafide F-box domains. So they are 2 positive controls for my pipeline.
I make these inferences from my ProQ evaluation results (please see screenshot of results in the image below):
- For both LOMETS and I-TASSER methods, and for both sequences, based on ProQ LGscore, the models are deemed "very good models"
- For both LOMETS and I-TASSER methods, for seed sequence, based on MaxSub score, the models are not even "fairly good"
- For both LOMETS, and I-TASSER methods, for sequence from solved PDB, based on MaxSub score, the models are only "fairly good"
My questions are as follows:
1. Is it valid to evaluate predicted protein structures for short sequences? IF yes, then is there still a minimum length limit?
2. Why are my MaxSub scores so poor?
3. Can I use only the LGscore results to decide whether I will accept or reject a predicted structure? If yes, then how will I set the cutoff? Please note, I am using predicted secondary structures for the ProQ evalations.
4. Same questions, but about my QMEANS results
THANKS!