Question

Ncbi Legacy Blast Usage With Tblastn/Pssm

3

Entering edit mode

14.4 years ago

Michael Schubert ★ 7.1k

I'm trying to get a webservice for protein discovery running. I would like to perform a tblastn with a PSSM from NCBI's archive (smp file). This works fine with NCBI BLAST+, but unfortunately the framework I should run it from only supports the old NCBI BLAST (2.2.21).

So I'm searching for a equivalent command to

tblastn -in_pssm matrix.smp -db database -evalue 1e-10 -out outfile -outfmt 6

and what I came up with was

blastall -p psitblastn -d database -R matrix.smp -o outfile -e 1e-10 -m 8

This command, however, has been running for hours without producing any output, error message, or consuming any cpu time (ps -A | grep blastall yields 0:00:00)

What am I doing wrong?

blast ncbi pssm • 5.2k views

ADD COMMENT • link updated 14.4 years ago by Science_Robot ★ 1.1k • written 14.4 years ago by Michael Schubert ★ 7.1k

score 1 · Answer 1 · 2010-08-17

1

Entering edit mode

14.4 years ago

Science_Robot ★ 1.1k

the query is specified with -i <queryfile>. The program is hanging idle because it's waiting for an input from STDIN.

EDIT: Not sure if this is the answer I do not know what your web-service requires. Are you providing a query using the web-service?

EDIT EDIT: Definition of PSSM from NCBI

A PSSM, or Position-Specific Scoring Matrix, is a type of scoring matrix used in protein BLAST searches in which amino acid substitution scores are given separately for each position in a protein multiple sequence alignment. Thus, a Tyr-Trp substitution at position A of an alignment may receive a very different score than the same substitution at position B. This is in contrast to position-independent matrices such as the PAM and BLOSUM matrices, in which the Tyr-Trp substitution receives the same score no matter at what position it occurs.

The PSSM is just a scoring matrix to be used in conjunction with a query.

ADD COMMENT • link 14.4 years ago by Science_Robot ★ 1.1k

1

Entering edit mode

Isn't the input sequence somewhat irrelevant when I already have a PSSM to search with?- however, I'll try supplying the sequence as well and see if it works. Thanks!

ADD REPLY • link 14.4 years ago by Michael Schubert ★ 7.1k

0

Entering edit mode

As I recall, the old NCBI blastall binary did not support searching with a pssm. To do that, I believe you need to use the separate blastpgp binary that should also be part of the distribution.

ADD REPLY • link 14.4 years ago by Lars Juhl Jensen 11k

0

Entering edit mode

To your edit2: as far as I understand, the values in a PSSM at each position are enough to define substitutions. If, eg., a Trp is at position X that is highly conserved, the matrix values will assign a high score to Trp and a low to all others (without needing to know that there was indeed a Trp in a large subset of sequences). Also, the concept of one input sequence for a profile generated from multiple homologues seems a bit shaky. Then again, I might be wrong ;)

ADD REPLY • link 14.4 years ago by Michael Schubert ★ 7.1k

0

Entering edit mode

Sorry for the late accept: the program really waits for stdin, but I still think that in theory it should not be necessary.

ADD REPLY • link 14.3 years ago by Michael Schubert ★ 7.1k

score 1 · Answer 2 · 2010-08-17

1

Entering edit mode

14.4 years ago

Lars Juhl Jensen 11k

As I recall, the old NCBI blastall binary did not support searching with a pssm. To do that, I believe you need to use the separate blastpgp binary that should also be part of the distribution.

ADD COMMENT • link 14.4 years ago by Lars Juhl Jensen 11k

0

Entering edit mode

According to http://www.csc.fi/english/research/sciences/bioscience/programs/blast/blastall, Documentation for PSI-TBLASTN, it should work with blastall.