Hello,
I am trying to set the number of maximum hits to 5, so that the procedure can finish sooner, but I still get 100s of hits found.
# TBLASTX 2.2.29+
# Query: Locus_40_Transcript_185/186_Confidence_0.224_Length_4778
# Database: ../../Genome/Genome
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 714 hits found
I am running:
tblastx -db ../../Genome/Genome -query all_merged_k125.fa -evalue 1e-10 -outfmt 7 -out tblastx/all_merged_k125.fmt7 -num_threads 16 -max_target_seqs 5
Any idea why it's still reporting so many hits?
Adrian
Ok that makes sense. How do I limit the amount of hits?
I would do it post blast (with
-outfmt 6
output):Make sure the file is sorted based on query and best hits (here bitscore > evalue > perc identity):
Then get the top 5 hits for every query:
Yes, the above command made my day!!!!!!!!!
Thanks
Setting
-max_target_seqns
to 1 will give only 1 subject/hit but several HSPs if they are present.Setting
-max_hsps
to 1 will give only 1 HSP per subject but for all subject/hits in the database.If you really want only 5 HSPs per subject, set the
-max_target_seqns
to 1 and-max_hsps
to 5.Makes sense, thank you!
(typo: should be
-max_target_seqs
instead of-max_target_seqns
)I guess you can always give a relatively stringent e-value and filter the resulting hits later.
What I wanted is to speed up the blasting.
I doubt limiting the number of hits like that would speed up your blasting significantly. It still has to go through the whole db for every query, so the only difference would be in how long it takes to write 5 or 10 lines (or whatever) to the output file. Instead, if your db is small (or you have a ton of RAM), you should parallelize blast (e.g. with GNU Parallel) by running multiple single-threaded blasts on split input instead of using
-num_threads X
..^True. You will benefit from multi-threading, and trying both
tblastx
andblastall -p tbalstx
before choosing one of them. For shorter query sequences, I've seen the latter be significantly faster than the former.