When I compare a 331bp sequence (JQ749729.1) to nt using megablast (via the 'Run BLAST' button on the right-hand side), I get only one match, with 98.43% identity for the query region.
However, when I do the reverse and compare the matching 11,612bp sequence (MN733821.1) to nt using megablast, I get many hits, however none of them are the highly similar JQ749729.1 sequence. They are all long sequences with only 70-80% identity.
I assume this is because the blast algorithm scores longer, less similar matches higher than shorter, more similar matches. I have tried changing the settings (scoring higher reward for matches and penalty for mismatches; increasing the word size; increasing the gap cost etc.) but I cannot get blast to find that short, highly similar match. I also tried doing this via command line so I could try other parameters, such as -perc_identity
, which I set to 95, but this ended up with 0 matches to nt.
Is there a way to adjust blast's parameters so that it will find that short, highly similar sequence when using the long sequence as a query and nt as the database? Or is there a different method more suited to this task? Thank you for your help.
have you tried changing the e-value threshold? might be that the hit oyu are talking about is not getting to the e-value threshold.
otherwise you could also change the number of hits returned, if the query sequence has lots of hits yours might not be among the best 250(?) reported ones