Hello folks,
I have heard that blastp truncates the query sequence even when there is an exact match. Let me illustrate. Lets say my query is "ABCDEFG" and it has a match with subject from D to G. The previous letters do not match the threshold and gets dropped so my result become.
query: DEFG
subject: DEFG
But here is the thing I was told that when the subject sequence is inspected its actually CDEFG but somehow the C gets dropped.
So I was trying to simulate a situation like this and I came across something that I am not able to understand. My input query is LNRNQPAATALANTIE
against the pdbaaDB and the code I am using is blastp -query query.txt -db pdbaa -taxidlist negative.list -matrix PAM30 -word_size 2 -threshold 21
and this code is giving me an output as below.
Query 3 RNQPAATALANTI 15
R QP AT TI
Sbjct 45 RSQPEATNASQTI 57
I even set the threshold to 2k but still I am keeping recieving this. I was wondering if any of you could enlighten me?
Thank you
Thanks for the reply.
Isn't the scoring has to go above the threshold and then alignment gets extended? To my knowledge, word size of 2, scores the amino acids and if its not the above threshold it gets dropped and moves to the next one. In this case,
RN - RS score is 8 which does not pass the threshold and moving to the next one NQ - SQ and so on, and none of the 2 word scoring is above the threshold thus the above results should not be presented, no?
it is not alignment score, it is the word extension scoring,
just because an algorithm starts the search in one location vs another does not mean that it won't find certain alignments. It still finds them, just on a different path.
as I said before it is not a parameter that we usually set to control the resulting alignments - it is a parameter that controls the speed of the search.
Hello again Istvan,
I have been trying to understand to what you have explained. Thank you for that.
There is still something not clear to me. So I set the blast algorithm with parameter to
-matrix PAM30 -word_size 2 -threshold 21
I understand that this is a parameter that controls the speed of search as you said. However, Shouldn't the matching alignment (given in the original post) be skipped as it does not meet my search criteria? As I know, its scoring each of the word length (word_size) using a subtitution matrix and only when that score is above the defined threshold it is extending the alignment till the score stays positive.As explained in here
Additionally, when I add
window_size 3
and set the threshold to 9 given alignment disappears and comes back when threshold is below 9. On the other hand whenwindow_size 4
, threshold does not make any difference, does not matter how low or high I set the threshold, it always appears.