The additional alignments reported for each hit are high-scoring segment pairs (HSPs). In NCBI BLAST+ the number of HSPs reported per hit (i.e. sequence in the database) can be limited using the -max_hsps
option (NCBI BLAST 2.2.29+):
-max_hsps <Integer, >=0>
Set maximum number of HSPs per subject sequence to save (0 means no limit)
Default = `0'
Legacy NCBI BLAST (blastall) does not allow the number of HSPs per hit to be controlled, so post processing is required to limit the number of HSPs appearing in 'blastall' output.
For example:
Using your example query sequence to search UniProtKB/SwissProt with NCBI BLAST+ find some hits that have more than one HSP and thus appear more than once in the output:
seq1 SP:SPE26_CAEEL 29.55 132 80 6 1 127 293 416 0.004 39.7
seq1 SP:MKLN1_RAT 25.00 136 83 6 23 142 250 382 0.061 36.2
seq1 SP:MKLN1_PONAB 25.00 136 83 6 23 142 250 382 0.061 36.2
seq1 SP:MKLN1_MOUSE 25.00 136 83 6 23 142 250 382 0.061 36.2
seq1 SP:MKLN1_HUMAN 25.00 136 83 6 23 142 250 382 0.061 36.2
seq1 SP:MDE6_SCHPO 26.40 125 72 7 1 110 277 396 0.48 33.5
seq1 SP:MDE6_SCHPO 25.00 72 49 2 42 111 266 334 2.6 31.6
seq1 SP:TYW4_YEAST 27.35 117 78 4 1 114 400 512 0.56 33.5
seq1 SP:GAN_HUMAN 26.73 101 63 6 35 131 310 403 2.0 32.0
seq1 SP:CC135_CHLRE 35.00 40 26 0 91 130 350 389 2.2 32.0
seq1 SP:CC135_CHLRE 28.46 123 71 6 1 114 310 424 6.9 30.4
seq1 SP:TAG53_CAEEL 24.81 129 82 8 15 132 377 501 4.3 30.8
seq1 SP:UTP12_YEAST 22.56 133 64 5 17 120 336 458 5.7 30.4
seq1 SP:SAHH_BACFR 24.76 105 70 4 9 113 184 279 6.1 30.4
seq1 SP:METN_BIFLO 35.48 62 31 1 51 103 307 368 7.9 30.0
seq1 SP:ATRN_MOUSE 25.00 96 56 5 42 125 502 593 9.4 30.0
seq1 SP:ANMK_PELUB 32.43 37 25 0 97 133 15 51 9.6 29.6
Adding -max_hsps 1
to the command-line limits each hit to reporting only one HSP, which eliminates the multiple reporting:
seq1 SP:SPE26_CAEEL 29.55 132 80 6 1 127 293 416 0.004 39.7
seq1 SP:MKLN1_RAT 25.00 136 83 6 23 142 250 382 0.061 36.2
seq1 SP:MKLN1_PONAB 25.00 136 83 6 23 142 250 382 0.061 36.2
seq1 SP:MKLN1_MOUSE 25.00 136 83 6 23 142 250 382 0.061 36.2
seq1 SP:MKLN1_HUMAN 25.00 136 83 6 23 142 250 382 0.061 36.2
seq1 SP:MDE6_SCHPO 26.40 125 72 7 1 110 277 396 0.48 33.5
seq1 SP:TYW4_YEAST 27.35 117 78 4 1 114 400 512 0.56 33.5
seq1 SP:GAN_HUMAN 26.73 101 63 6 35 131 310 403 2.0 32.0
seq1 SP:TAG53_CAEEL 24.81 129 82 8 15 132 377 501 4.3 30.8
seq1 SP:UTP12_YEAST 22.56 133 64 5 17 120 336 458 5.7 30.4
seq1 SP:SAHH_BACFR 24.76 105 70 4 9 113 184 279 6.1 30.4
seq1 SP:CC135_CHLRE 28.46 123 71 6 1 114 310 424 6.9 30.4
seq1 SP:METN_BIFLO 35.48 62 31 1 51 103 307 368 7.9 30.0
seq1 SP:ATRN_MOUSE 25.00 96 56 5 42 125 502 593 9.4 30.0
seq1 SP:ANMK_PELUB 32.43 37 25 0 97 133 15 51 9.6 29.6
Update: it appears that -max_hsps
replaced the -max_hsps_per_subject
option in NCBI BLAST 2.2.29+ (Jan 2014). In earlier NCBI BLAST+ versions the -max_hsps_per_subject
should have the same effect, but appears to be bugged. This change is not mentioned in the BLAST+ Release Notes for some reason, so I am guessing it was mixed in with some other fix/feature. For earlier versions it appears that the options are to try using other parameters to get the desired result (see Obtaining the top matches from blast) or using post processing steps to filter out the additional alignments.
First of all, thanks for taking the time...the blast is obviously strong with you :-)
Using Package: blast 2.2.27, build Mar 23 2013 16:48:04 I tried
I then tried
but alas even though I specify
-max_target_seqs 20 -max_hsps_per_subject 1
I still get 95 hits?!?Looks like the
-max_hsps_per_subject
option was bugged... it was replaced with-max_hsps
in NCBI BLAST 2.2.29+, so you might want to upgrade to the current version.Will talk to sysadm and report back, thanks again!