From what I can tell, there are at least two things that you are doing wrong.
First, it seems that your query file has multiple sequences. If that's indeed the case, psiblast
will search with each sequence individually (and sequentially), and each search will overwrite the results of previous. That is extremely wasteful because it will take lots of time and in the end you will get results only for your last sequence. If you have multiple sequences, split them into individual files and have the results stored into different files. The search will take the same amount of time, but you will end up with results for all sequences rather than just the last one.
Second, the way you formulated the command will save thePSSM after the second iteration. Yes, the PSSM file contains the converted multiple alignment of BLAST hits rather than your starting sequences. However, there is a -save_pssm_after_last_round
switch that does exactly what it sounds like. If you don't invoke it, the PSSM will be from the penultimate iteration, which is again wasteful because the results of last iteration will not count for anything. In fact, the following command will produce exactly the same PSSM as yours while running one fewer iteration:
psiblast -query myfasta.fasta -db mydb -num_iterations 2 -out_pssm mypssm.smp -save_pssm_after_last_round
By the way, what you posted in the other thread:
Warning: [psiblast] Query_1: Composition-based score adjustment conditioned on sequence properties and unconditional composition-based score adjustment is not supported with PSSMs, resetting to default value of standard composition-based statistics
As it says, it is only a warning and you can safely ignore it. When running more than one iteration, psiblast
will use the newly created PSSM and therefore can't apply composition-based statistics because those are pre-calculated only for single-iteration searches that use fixed substitution matrices. The warning will not appear if you run the same command as you did but with a single iteration.
When I run it I see this at the top:
PssmWithParameters ::= { pssm { isProtein TRUE, numRows 28, numColumns 2291, byRow FALSE, query seq { id { local str "Query_147" }, descr { title "419612_0:004b79" },
And "419612_0:004b79" is the name of my last sequence, out of 147 queries. What does this mean? It doesn't mean that the matrix is only based on the last sequence does it?