I know there have already been a few general questions about Secondary Structure Prediction, but hopefully mine is a bit more specific.
I am currently using psipred
as part of a pipeline where I need secondary-structure and wondering if there is any way I can speed up the process a little:
Running locally, on a reasonably-powerful CPU, psipred
takes on average 5-10 minutes (mainly for blasting with psiblast+) on medium-sized sequences (less than 300 letters), when the online version resolves it in 1 or 2.
Furthermore, a lot of the queries involve known proteins, for which a 100% match exists, so I would expect the match to be returned on the first iteration and the search to be terminated (instead, I suspect the tool keeps running until the end and lesser matches are found).
I know I can lower the number of iterations, but it doesn't seem to bring the execution time by much (and I would run the risk of not finding any homology, on the off-chance that there is no exact match).
Is there any other way I could modify the parameters given to psiblast
in the runpsipred
script, to speed things up (even at the expense of some precision)?
For example a way I could make it stop immediately if an exact match is found (I reckon this should be sufficient for the rest of psipred
's algorithm).
For anybody who may be familiar with psiblast
, but not psipred
, here is the command currently used by the psipred
pipeline:
$ncbidir/psiblast -db $dbname -query $tmproot.fasta -inclusion_ethresh 0.001 -out_pssm $tmproot.chk -num_iterations 3 -num_alignments 0 >& $tmproot.blast
The PSSM file (-out_pssm
) is the important output for the rest of the algorithm.
Alternative question: can anybody recommend a tool with prediction performances, that can be run locally (and as a command line) with better speed performances?