I'm using blastn
(https://anaconda.org/bioconda/blast) to find similar sequences of a target sequence against a FASTA
file. But my read is quite short (68 bases). I realised that blastn won't report any hit. But there is actually a very good one in the FASTA
file after checking manually.
Here is the query sequence:
$ cat query.fa
>query read
GGAGTAGGCGCGAGCGGCAGGAGGCGGGCAGGCGGAGGGCGAGGCAGGGAGGCGCCGCCTGGAGCGCA
And this is the "good one" that I found manually (I picked it out from the original FASTA
file and save it as a single FASTA
file and also created the BLAST database from it using makeblastdb
):
$ cat db.fa
>database read
GAGTAGGCGCGAGCTAAGCAGGAGGCGGAGGCGGAGGCGGAGGGCGAGGGGCGGGGAGCGCCGCCTGGAGCGCGGCAG
And the command that I used is:
$ blastn -db db.fa -query query.fa -outfmt "6 qseqid sseqid evalue length pident bitscore ppos"
(then no hit was reported)
But they two can actually match very well:
database 1 -GAGTAGGCGCGAGCTAAGCAGGAGGCGGAGGCGGAGGCGGAGGGCGAGG 49
|||||||||||||| .||||||||||| |.|||||||||||||
query 1 GGAGTAGGCGCGAGC--GGCAGGAGGCGG----GCAGGCGGAGGGCGA-- 42
database 50 GGCGGGGA-GCGCCGCCTGGAGCGCGGCAG 78
|||.|||| ||||||||||||||||.
query 43 GGCAGGGAGGCGCCGCCTGGAGCGCA---- 68
(above is the alignment result by needle
)
Is there any parameter of blastn
that I can adjust to relax the BLAST requirement cutoff?
It is also worth noting, I think, that the
-word_size
parameter can be used.blastn
uses 11 as default, but can go until 4 nucleotides,short-blastn
has a default of 7 but can go as low as 4 as well. E-value cut off can also be relaxed, althout the default is 10, so it can be left blank.Yes, that is very useful, reducing the word_size will also increase run time slightly. Alternatively, other aligners, like gmap, Fasta (much slower but still a bit faster than needle), or exonerate (much slower but very accurate) could be tried.
-word_size
is awesome! Butblastn-short
doesn't seem to have a default of 7, since-word_size 7
gave me many hits but-task blastn-short
gave me nothing (with same-evalue
). I only got hits with using-task blastn-short
unless I set-evalue
to a very high value, but then because of the high E-value, I got many noisy hits.Thank you Michi! I found
-word_size
more flexible thanblastn-short
though