Hello folks, I am currently using blastn on commandline trying to blast a complete genome over a database. when i do this i can get only those hits above 90% percent identity. But i would like to see those hits even if they have 40% percent identity. Could someone help me out with this? Thanks in advance...
command i am currently using is:
"blastn -db db.fasta -query query.fasta -out queryout.tsv -outfmt 6"
interesting, any idea why that is? == that you have to add the
-task blastn
while already asking to doblastn
?blastn does not do an alignment with all sequences, this will take to long. Before the actual alignments there is a pre "filter" step that makes a selection from the reference database based on exact matches. In other words, it takes substrings from your query and it tries to find sequences that have exactly that string. It works like kmers. This is really fast, if a sequence in the reference already not contains that small overlap it is no use to align it with a heavy algoritm. Megablast searches default with substrings(kmers) of 28 bases and blastn with 11. So if u use megablast all the sequences that does not have an exacted match somewhere of 28 bases in a sequence in the database the sequence will be discarded. Megablast is the default task