Hello all!
I'm trying to blast a single sequence against a custom blastdb (7 Gb in size) and the number of results is lower than what I expected.
Also, if I remove the sequence that were correctly aligned and launch a second blastn against the truncated blastdb, I get new results.
I indexed the custom blastdb with the following command:
makeblastdb -in db.fna -dbtype nucl
Then I launched blastn with the following command:
blastn -db db.fna -max_target_seqs 1000000 -word_size 11 -outfmt 6 -query query.fna > blast.out
I tried those commands with blast v2.4.0 and v2.7.1 that are currently available on our servers.
Am I missing something?
Thanks!
Charles.
How many results do you expect? How do you know that the number of results you expect actually exist?
We are looking at the complete genome of roughly 20 000 bacterial species and we expect to find the queried gene in most of them.
It could be possible that the gene in question is not present in as many genome as expected (although it seems unlikely based on the biologist I'm working with). The main problem is that in the first round of blastn, I find ~8000 hits and in the second round (after removing the first ~8000 hits), I find ~9000 new hits.
I should also have mentioned that I get some perfect alignments in both rounds of blastn (pident 100 over all the query length).
Maybe I could rephrase my questions as:
Why all the perfect alignements are not returned in the first round of blastn?
do you remove the 8000 hits from your query set or from the DB set?
Query set is a single sequence corresponding to a gene of interest.
I removed the 8000 hits from the first round of blastn from the DB set to see if blastn would find new results in a second round.
The fact that the second blastn returned multiple perfect alignement in this second round was unexpected for me. I expected all the perfect hits to be found in the first round.
BLAST cannot do that - it solves the alignment problem optimally, and so there will always be compromise. 100% accurate results as a target is computationally expensive and is seldom the requirement that BLAST is used for.
try:
It returns ~8000 results.
What about this one: