As Michael mentions the NCBI BLAST+ distribution does not currently (NCBI BLAST 2.2.25) contain a tool for creating RPS-BLAST databases. Checking in the NCBI BLAST documentation which covers "Legacy" BLAST usage an equivalent for 'formatrpsdb' is one of the programs which fall under: "Those programs have no blast+ counterpart at this time." Given that the "Legacy" NCBI BLAST has been deprecated in favour of NCBI BLAST+, my guess is that a replacement tool will appear in the next couple of releases.
In the meantime it looks like the only options for building databases for use with 'rpsblast' are:
Use 'formatrpsdb' from the "Legacy" NCBI BLAST distribution.
Use the old 'makemat'/'copymat'/'formatdb' method, which was replaced by 'formatrpsdb', but is described in the "Legacy" NCBI BLAST 'rpsblast' documentation.
If you are only interested in the databases which NCBI provide in their CD-search service, then there are a couple of additional options:
Download the pre-formatted databases from the NCBI FTP site, see the README file for details of the available databases and the appropriate versions for your environment. As mentioned by Niek these files are also available from third parties.
Use the CD-search web service to access the NCBI CD-search service remotely. This has the advantage of NCBI doing all the database and software maintenance.
Since RPS-BLAST is a method for searching a database of protein signatures (PSI-BLAST derived PSSM profiles in this case) with a sequence. It may be worth looking at alternative tools which provide sequence searches against protein signature databases. Given that your example used Pfam a couple of Hidden Markov Model (HMM) based methods come to mind:
- HMMER which is used both during generation of Pfam and as their search tool.
- HH-Suite, which features the HH-blits search tool which performs sequence searches against a HMM database derived from alignments.
There are many protein signature methods and associated databases of protein signatures, if you want to look into alternative methods and databases then a good place to start is InterPro, which integrates information from a number of protein signature databases. To search InterPro with a sequence use InterProScan, which is available on the web, via web services (both SOAP and REST) and as a software download.
Edit: 5-MAR-2012
With the release of NCBI BLAST+ 2.2.26 'makeprofiledb' has been introduced to replace 'formatrpsdb' from "Legacy" NCBI BLAST:
$ ./makeprofiledb -help
USAGE
makeprofiledb [-h] [-help] -in in_pssm_list [-binary]
[-title database_title] [-threshold word_score_threshold]
[-out database_name] [-max_file_sz max_file_size_in_bytes]
[-dbtype output_db_type] [-index create_index_files]
[-gapopen gap_open_penalty] [-gapextend gap_extend_penalty]
[-scale pssm_scale_factor] [-matrix matrix_name]
[-obsr_threshold observations_threshold]
[-exclude_invalid exclude_invalid] [-logfile File_Name] [-version]
DESCRIPTION
Application to create databases for rpsblast, cobalt and deltablast,
version 2.2.26+
REQUIRED ARGUMENTS
-in <File_In>
Input file that contains a list of smp files (delimited by space, tab or
newline)
...
I suspect the NCBI BLAST News and the NCBI BLAST documentation will be updated shortly to reflect the new release.
I've found that the rpsblast binaries that come with the BLAST+ package core dump when I use -cpu > 1.
"Help with..." is not a question. It would make it easier for others to find your question and the answers if it is phrased as a question.
why do you prefer it to HMMER? (it's a real question)