Hi Biostar,
I am trying to do Blastn on some data collected. However, as there are too many possible hits, I tried to limit the number of alignment showed in the result file by using -max_target_seqs option.
The following is my command:
blastn -db ./db/nt -query ./test/250.fasta -out ./result/testing -outfmt 6 -num_threads 4 -max_target_seqs 10
I assumed that this means that only the TOP ten alignment result would be stored into the resulted testing file.
However, there are some alignment result missed when compared to the complete alignment without the -max_target_seqs option.
Here is the result from the complete alignment without the -max_target_seqs option:
template_a LM000940.1 88.407 6478 290 342 32 6134 8360 1969 0.0 7376
template_a HQ724614.1 88.390 6477 293 343 32 6134 7911 1520 0.0 7371
template_a HQ724617.1 88.342 6476 298 343 32 6134 7911 1520 0.0 7354
template_a GU188856.2 88.314 6478 296 342 32 6134 7911 1520 0.0 7343
template_a HQ724616.1 88.297 6477 299 343 32 6134 7911 1520 0.0 7337
template_a KT851543.1 87.064 6501 358 356 32 6134 7938 1523 0.0 6900
And this is result from the one using the -max_target_seqs option:
template_a HQ724614.1 88.391 6478 291 342 32 6134 7911 1520 0.0 7371
template_a AF425847.1 84.359 2193 212 107 3987 6075 2306 141 0.0 2028
template_a FN395201.1 87.871 1484 82 74 4729 6134 1538 75 0.0 1653
template_a FN395183.1 87.736 1484 84 75 4729 6134 1538 75 0.0 1642
template_a FN395186.1 87.668 1484 85 75 4729 6134 1538 75 0.0 1637
template_a FN395181.1 87.668 1484 85 75 4729 6134 1538 75 0.0 1637
template_a DQ360492.1 87.677 1485 81 77 4730 6134 1514 52 0.0 1635
As you can see there are pretty much results are missed after suing the option... Especially the most important one, the highest scored alignment to LM000940.1.
Is the -max_target_seqs suitable to be used in my case?
Or in fact it does not ensure the TOP ten result will be recorded but it means that the FIRST ten results will be recorded?
I am a bit new to Blasting on a computer so... can anyone help me out of this ?
Thank you very much!!
I assumed that this means that only the TOP ten alignment result would be stored into the resulted testing file.
Note, maximum number of aligned sequences, not maximum number of results. You can have a thousand results from just one sequence. I have no clue why it's removing your best hit though. AFAIK that shouldn't happen.
edit. from here
Hello, Thank you for the report. We don't consider this a bug, but I agree that we should document this possibility better. This can happen because limits, including max target sequences, are applied in an early ungapped phase of the algorithm, as well as later. In some cases a final HSP will improve enough in the later gapped phase to rise to the top hits. In your case, relaxing the limit to 200 appears to have allowed hits that would have been excluded in the ungapped phase at 100 max target sequences to rise.
I see, so NCBI did not regarded it as a bug...
Which means actually there does not exist a good way to limit the number of alignment showed except filtering after running Blastn ... But my file will be super large if I do so...
It seems like there is not yet a faster and a better program on performing Blastn yet... Where DIAMOND is only suitable for Blastx or Blastp ...
Thank you Heikki for you answer!
You could try PLAST. It's many times faster than BLAST and includes a 'blastn'. I found it to be a little faster and a little more sensitive than Diamond when I tested them head to head (at least for the blastx like variants).
PLAST! First time hearing it! I will give it a try! Thanks Jacob!