I'm trying to get a webservice for protein discovery running. I would like to perform a tblastn with a PSSM from NCBI's archive (smp file). This works fine with NCBI BLAST+, but unfortunately the framework I should run it from only supports the old NCBI BLAST (2.2.21).
So I'm searching for a equivalent command to
tblastn -in_pssm matrix.smp -db database -evalue 1e-10 -out outfile -outfmt 6
and what I came up with was
blastall -p psitblastn -d database -R matrix.smp -o outfile -e 1e-10 -m 8
This command, however, has been running for hours without producing any output, error message, or consuming any cpu time (ps -A | grep blastall yields 0:00:00)
What am I doing wrong?
Isn't the input sequence somewhat irrelevant when I already have a PSSM to search with?- however, I'll try supplying the sequence as well and see if it works. Thanks!
As I recall, the old NCBI blastall binary did not support searching with a pssm. To do that, I believe you need to use the separate blastpgp binary that should also be part of the distribution.
To your edit2: as far as I understand, the values in a PSSM at each position are enough to define substitutions. If, eg., a Trp is at position X that is highly conserved, the matrix values will assign a high score to Trp and a low to all others (without needing to know that there was indeed a Trp in a large subset of sequences). Also, the concept of one input sequence for a profile generated from multiple homologues seems a bit shaky. Then again, I might be wrong ;)
Sorry for the late accept: the program really waits for stdin, but I still think that in theory it should not be necessary.