Does anyone have an idea on how to make NCBI BLAST work with custom Matrices? i.e. ones that are not provided by the BLOSUM series that come as a default with NCBI BLAST.
Does anyone have an idea on how to make NCBI BLAST work with custom Matrices? i.e. ones that are not provided by the BLOSUM series that come as a default with NCBI BLAST.
There is a very dirty trick to do that. You just need to name your custom matrix as a current supported matrix and put it int the matrices dir. That's the solution you can find at NCBI. It works. But, beware! Defaults are now a problem.
You can check additional details in:
http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall.html#5
I'm not sure about the blast version in this site. In my box, I have ncbi-tools (ubuntu pkg) installed and, for example, PAM30 is in /usr/share/ncbi/data/PAM30. Got the idea?
I can tell you what does not work and suggest a possible solution.
When BLAST is installed locally on a Linux system from an NCBI package, the matrices are stored in /usr/share/ncbi/data, as plain text files. So, I tried copying the BLOSUM62 matrix to a new file named "BLOSUM00", then running blastall as:
blastall -p blastp -d nr -i myseq.fa -M BLOSUM00
And I got this error message:
Searching[blastall] ERROR: Q02066.1: BlastKarlinBlkGappedCalc: BLOSUM00 is not a supported matrix
[blastall] ERROR: Q02066.1: BlastKarlinBlkGappedCalc: BLOSUM80 is a supported matrix
[blastall] ERROR: Q02066.1: BlastKarlinBlkGappedCalc: BLOSUM62 is a supported matrix
[blastall] ERROR: Q02066.1: BlastKarlinBlkGappedCalc: BLOSUM50 is a supported matrix
[blastall] ERROR: Q02066.1: BlastKarlinBlkGappedCalc: BLOSUM45 is a supported matrix
[blastall] ERROR: Q02066.1: BlastKarlinBlkGappedCalc: PAM250 is a supported matrix
[blastall] ERROR: Q02066.1: BlastKarlinBlkGappedCalc: BLOSUM62_20 is a supported matrix
[blastall] ERROR: Q02066.1: BlastKarlinBlkGappedCalc: BLOSUM90 is a supported matrix
[blastall] ERROR: Q02066.1: BlastKarlinBlkGappedCalc: PAM30 is a supported matrix
[blastall] ERROR: Q02066.1: BlastKarlinBlkGappedCalc: PAM70 is a supported matrix
This indicates to me that the BLAST matrices are hard-coded in the BLAST source code. One solution might be to download the BLAST source code, find the file related to "BlastKarlinBlkGappedCalc", edit the source and see if you can compile BLAST.
Hi, I am having this same problem. First I tried just renaming my matrix as a default BLAST matrix (e.g BLOSUM45). It ran fine but the results I got were identical to those produced by using the actual BLOSUM45 matrix. I tried downloading the BLAST source code, then editing the BLOSUM45 Matrix and then recompiling but it still produced the same output as using the regular BLOSUM45 matrix. Is there anyone here who has actually got a custom matrix working for BLAST. I'd really appreciate any help you could give me.
If anyone comes across this. You can add additional matrices to BLAST by adding them to blast_stat.c (src\algo\blast\core)
To add a new matrix to blast_stat.c it is necessary to complete four steps. As an example consider adding the matrix called TESTMATRIX
1.) add a define specifying how many different existence and extensions penalties are allowed, so it would be necessary to add the line:
#define TESTMATRIX_VALUES_MAX 14
if 14 values were to be allowed.
2.) add a two-dimensional array to contain the statistical parameters:
static array_of_8 testmatrix_values[TESTMATRIX_VALUES_MAX] ={ ...
3.) add a "prefs" array that should hint about the "optimal" gap existence and extension penalties:
static Int4 testmatrix_prefs[TESTMATRIX_VALUES_MAX] = {
BLAST_MATRIX_NOMINAL, ...
};
4.) Go to the function BlastLoadMatrixValues (in this file) and add two lines before the return at the end of the function:
matrix_info = MatrixInfoNew("TESTMATRIX", testmatrix_values, testmatrix_prefs, TESTMATRIX_VALUES_MAX);
ListNodeAddPointer(&retval, 0, matrix_info);
Any idea what 14 in this case means? is it total unique values within the matrix? or the max substitution value of pair of amino acid in that matrix? the comment next to the predefined matrices says for example BLOSUM45
#define BLOSUM45_VALUES_MAX 14 /**< Number of different combinations supported for BLOSUM45. */
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I really don't think it is wise to hard code the matrices within BLAST. Maybe there should be an option to use another matrix.
Due to the use of precalculated values for some of the statistics NCBI BLAST (and NCBI BLAST+) only supports some combinations of matrix and gap penalties (see http://web.archive.org/web/20070121032949/http://www.ncbi.nlm.nih.gov/blast/blast_whatsnew.shtml#20051206.2). If you want to use arbitary values for these then look at other BLAST implementations (WU-BLAST/AB-BLAST supports more matrices) or other sequence similarity search tools. For example the FASTA suite programs derive the statistics and so don't have these constraints (although this does mean it is possible to perform meaningless searches).