Hi,
I am using BLAST command line tool version 2.6.0.
I have a short sequence database db.fa which consists of 32 sequences with 22 bp. I executed the query data query.fa against db.fa and it shows no hit. However, when I reduce the database to random 26 sequences and created db.partial.fa, it shows hits. I was wondering if there is any size limit for short sequence database to be used?
Please find below the execution:
(1) DB with 32 sequences
This does not produce hit.
makeblastdb -in db.fa -dbtype nucl -parse_seqids
blastn -task blastn-short -query query.fa -db db.fa -out query.db.blastn
(2) DB with 26 sequences
This produces hits.
makeblastdb -in db.partial.fa -dbtype nucl -parse_seqids
blastn -task blastn-short -query query.fa -db db.partial.fa -out query.db.partial.blastn
Thanks for your help. I have some test files but there is no option to attach the files (I hope Biostars would allow this in the future).
[Edit]
Since I couldn't attach any file, I will explain what I have used. I used AB011515 from NCBI as query.fa. I generated 32 sequences with exactly same bases of CTTGGTCATTTAGAGGAAGTAA as db.fa, and generated 26 sequences with exactly same bases of above as db.partial.fa, respectively. Hope that explains.
let me first point out that this a very old version for blast you're using (we're at 2.10.0 for the moment)
Do you get any output when running the 'normal' blastn? (blastn-short actually points to short input query's and not short sequences in the DB)
Can you post the output of the
makeblastdb
cmd? moreover, for the version 2.2.18 the command to use to format a blastdb wasformatdb
(and notmakeblastdb
)My apologies. It was version 2.6.0 not 2.2.18. I will correct it. It produces no hit using normal
blastn
as well.The
makeblastdb
output is as below:OK, looks all fine.
Can you post what the hits look like (in the second try you mention)?
Are the 26 sequences a subpart of the 32 or are they different sequences? Could it be there simply are not hits?
The original 32 sequences are different. For simplicity, I created a database of 32 duplicates as
db.fa
and 26 duplicates fordb.partial.fa
. Therefore, fordb.fa
, it looks like this (As you can see, they are all the same):and db.partial.fa is from
DB_01 to DB_26
ofdb.fa.
Please find below the blast output: using
db.fa
using
db.partial.fa
Can I, on the side, ask what the goal of all this is? So far it is making little (biological) sense to me :/
@lieven.sterck. Nothing much special. I was trying to match sequences with different composition of primers and realised that I could not do it with more primer sequences and had not clue why.