Hi again,
I'm trying to do a local NCBI BLAST search using the PSSM of a conserved CDD domain (.smp file in the database, cave LARGE file).
My database is containing nucleotide sequences and the PSSM is for proteins, so I would like to use tblastn for this search. The tblastn executable accepts PSSMs, so no problem there.
But: the matrix in the .smp file is scaled by a factor of 100; according to the specs, the BLAST program should be able to downscale the matrices back to a factor of 1 automatically.
If I'm running tblastn however, I get the following error:
BLAST engine error: PSSM has a scaling factor of 100. PSI-BLAST does not accept scaled PSSMs
Now, is there either a way to run tblastn with scaled matrices or a tool to scale them? And if not, do I just scale the matrix values by 1/100 or the lambda, kappa and h factors (whatever these are) as well?
Thanks for the answer! I have some things to add I found out myself meanwhile: lambda, kappa, and h values are stored as triplets of numbers in the NCBI smp files, eg. lambda { 267, 10, -3 }. I think that means a value of 267 * 10^-3 = 0.267, being saved in this fashion because the format only supports integers as values. Thus dividing the weights by 100 will result in a loss of accurary, which in my case will not matter much because I use this search only as a pre-filtering step.
Yes, no surprise that dividing the weights by 100 will result in a loss of accuracy. As far as I understand, the whole purpose of initially scaling them up by a factor of 100 was to increase accuracy. Great that you figured out what the triplets mean - it was not clear to me at all :-)