Custom Matrices With Ncbi Blast
4
9
Entering edit mode
14.8 years ago
hadasa ★ 1.0k

Does anyone have an idea on how to make NCBI BLAST work with custom Matrices? i.e. ones that are not provided by the BLOSUM series that come as a default with NCBI BLAST.

blast matrix • 9.3k views
ADD COMMENT
0
Entering edit mode

I really don't think it is wise to hard code the matrices within BLAST. Maybe there should be an option to use another matrix.

ADD REPLY
0
Entering edit mode

Due to the use of precalculated values for some of the statistics NCBI BLAST (and NCBI BLAST+) only supports some combinations of matrix and gap penalties (see http://web.archive.org/web/20070121032949/http://www.ncbi.nlm.nih.gov/blast/blast_whatsnew.shtml#20051206.2). If you want to use arbitary values for these then look at other BLAST implementations (WU-BLAST/AB-BLAST supports more matrices) or other sequence similarity search tools. For example the FASTA suite programs derive the statistics and so don't have these constraints (although this does mean it is possible to perform meaningless searches).

ADD REPLY
8
Entering edit mode
14.8 years ago

There is a very dirty trick to do that. You just need to name your custom matrix as a current supported matrix and put it int the matrices dir. That's the solution you can find at NCBI. It works. But, beware! Defaults are now a problem.

You can check additional details in:

http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall.html#5

I'm not sure about the blast version in this site. In my box, I have ncbi-tools (ubuntu pkg) installed and, for example, PAM30 is in /usr/share/ncbi/data/PAM30. Got the idea?

ADD COMMENT
0
Entering edit mode

thanks! someone mentioned to me about a -V T parameter using an 'old blast engine' have not tried it yet

ADD REPLY
0
Entering edit mode

yeah that's really dirty lol!

ADD REPLY
0
Entering edit mode

Unfortunately, it does not work when using nucleotide blast (-p blastn).

ADD REPLY
0
Entering edit mode

It should work for every ncbi app. You just need to know which matrices it's using and where they are. I've tested with nucleotides and protein. You should recheck your paths.

ADD REPLY
0
Entering edit mode

Sorry, but you might be wrong. The nucleotide matrix is hard-coded in the source files with no option to use or replace any file.

ADD REPLY
0
Entering edit mode

While the template matrix for nucleotide searches is hard-coded, the scaling is not, and is controlled via the match/mismatch parameters.

ADD REPLY
0
Entering edit mode

In later versions of BLAST this is true (2.2.20 onwards). However we successfully applied this trick by using version 2.2.13. This allowed us to successfully replaced the required matrices.

ADD REPLY
0
Entering edit mode

Anyway, even in BLAST+ > 2.2.13 the is also applicable in its source code form. You can add any matrices you like. Just need to modify some header files. Tested it here with blast+-2.2.24 and works fine! I'll update may answer ASAP.

ADD REPLY
4
Entering edit mode
14.8 years ago
Neilfws 49k

I can tell you what does not work and suggest a possible solution.

When BLAST is installed locally on a Linux system from an NCBI package, the matrices are stored in /usr/share/ncbi/data, as plain text files. So, I tried copying the BLOSUM62 matrix to a new file named "BLOSUM00", then running blastall as:

blastall -p blastp -d nr -i myseq.fa -M BLOSUM00

And I got this error message:

Searching[blastall] ERROR: Q02066.1: BlastKarlinBlkGappedCalc: BLOSUM00 is not a supported matrix
[blastall] ERROR: Q02066.1: BlastKarlinBlkGappedCalc: BLOSUM80 is a supported matrix
[blastall] ERROR: Q02066.1: BlastKarlinBlkGappedCalc: BLOSUM62 is a supported matrix
[blastall] ERROR: Q02066.1: BlastKarlinBlkGappedCalc: BLOSUM50 is a supported matrix
[blastall] ERROR: Q02066.1: BlastKarlinBlkGappedCalc: BLOSUM45 is a supported matrix
[blastall] ERROR: Q02066.1: BlastKarlinBlkGappedCalc: PAM250 is a supported matrix
[blastall] ERROR: Q02066.1: BlastKarlinBlkGappedCalc: BLOSUM62_20 is a supported matrix
[blastall] ERROR: Q02066.1: BlastKarlinBlkGappedCalc: BLOSUM90 is a supported matrix
[blastall] ERROR: Q02066.1: BlastKarlinBlkGappedCalc: PAM30 is a supported matrix
[blastall] ERROR: Q02066.1: BlastKarlinBlkGappedCalc: PAM70 is a supported matrix

This indicates to me that the BLAST matrices are hard-coded in the BLAST source code. One solution might be to download the BLAST source code, find the file related to "BlastKarlinBlkGappedCalc", edit the source and see if you can compile BLAST.

ADD COMMENT
0
Entering edit mode

That's the point. Looking for simple ways to do it. Have not come across an easy way of recompiling BLAST.

ADD REPLY
1
Entering edit mode
13.9 years ago

Hi, I am having this same problem. First I tried just renaming my matrix as a default BLAST matrix (e.g BLOSUM45). It ran fine but the results I got were identical to those produced by using the actual BLOSUM45 matrix. I tried downloading the BLAST source code, then editing the BLOSUM45 Matrix and then recompiling but it still produced the same output as using the regular BLOSUM45 matrix. Is there anyone here who has actually got a custom matrix working for BLAST. I'd really appreciate any help you could give me.

ADD COMMENT
1
Entering edit mode
11.9 years ago
colinDotAIBN ▴ 20

If anyone comes across this. You can add additional matrices to BLAST by adding them to blast_stat.c (src\algo\blast\core)

How to add a new matrix to blast_stat.c:

To add a new matrix to blast_stat.c it is necessary to complete four steps. As an example consider adding the matrix called TESTMATRIX

1.) add a define specifying how many different existence and extensions penalties are allowed, so it would be necessary to add the line:

#define TESTMATRIX_VALUES_MAX 14

if 14 values were to be allowed.

2.) add a two-dimensional array to contain the statistical parameters:

static array_of_8 testmatrix_values[TESTMATRIX_VALUES_MAX] ={ ...

3.) add a "prefs" array that should hint about the "optimal" gap existence and extension penalties:

static Int4 testmatrix_prefs[TESTMATRIX_VALUES_MAX] = {
  BLAST_MATRIX_NOMINAL, ...
};

4.) Go to the function BlastLoadMatrixValues (in this file) and add two lines before the return at the end of the function:

matrix_info = MatrixInfoNew("TESTMATRIX", testmatrix_values, testmatrix_prefs, TESTMATRIX_VALUES_MAX);
ListNodeAddPointer(&retval, 0, matrix_info);
ADD COMMENT
0
Entering edit mode

Any idea what 14 in this case means? is it total unique values within the matrix? or the max substitution value of pair of amino acid in that matrix? the comment next to the predefined matrices says for example BLOSUM45

#define BLOSUM45_VALUES_MAX 14 /**< Number of different combinations supported for BLOSUM45. */
ADD REPLY

Login before adding your answer.

Traffic: 1517 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6