I am completely mystified by this very basic asymmetry in blastn results. I have two FASTA files, which share a 15-mer. However, that common 15-mer only results in a hit if I have one file as subject and the other as query; if I reverse the files, no hit:
$ blastn -task blastn-short -outfmt 6 -ungapped -strand plus -perc_identity 100 -word_size 15 \
-query Phvul.007G125800.5.0kb.upstream.fasta \
-subject Phvul.009G200800.5.0kb.upstream.fasta
Phvul.007G125800 Phvul.009G200800 100.00 15 0 0 3698 3712 1965 1979 0.020 30.2
$ blastn -task blastn-short -outfmt 6 -ungapped -strand plus -perc_identity 100 -word_size 15 \
-query Phvul.009G200800.5.0kb.upstream.fasta \
-subject Phvul.007G125800.5.0kb.upstream.fasta
$
This is highly reproducible and has some consistency. I've got five 5000-nt sequences, all of which share this same 15-mer. When I use two of them as query sequences, I get the 15-mer hit in all cases. When I use the other three as query sequences, I never get the 15-mer hit. If I blast just the 15-mer by itself against the sequences, I get the hit on all sequences.
Any ideas what's going on? This behavior is independent of word_size, by the way, all the way down to 8. I find this very disconcerting, since I thought that blastn would be symmetric w.r.t. query and subject, at least when they're the same size.
And since you were about to ask, this behavior is reproduced in versions 2.2.31+ (commonly found in distros) and the latest from NCBI, 2.5.0+.
Quite interesting. I have not tested and am guessing but the only thing that is different is the database-size and thereby the e-value. The default cutoff is 10 which is high to being with so this could not be the reason. But can you please check with evalue cutoff of more than 10 specified. I still use the legacy blast and not the new one and am only guessing.