Hi! I not completely understand Best-Hits filtering algorithm implemented in BLAST+ aplications. As far i can understand basic idea is to filter out short HSPs with lower E-value and save only relatively long HSPs. Am i right? From the BLAST documentation i have :
"The Best-Hit filtering algorithm is designed for use in applications that are searching for only the best matches for each query region reporting matches. Its -best_hit_overhang parameter, H, controls when an HSP is considered short enough to be filtered due to presence of another HSP. For each HSP A that is filtered, there exists another HSP B such that the query region of HSP A extends each end of the query region of HSP B by at most H times the length of the query region for B. Additional requirements that must also be met in order to filter A on account of B are:
i. evalue(A) >= evalue(B)
ii. score(A)/length(A) < (1.0 – score_edge) * score(B)/length(B)
We consider 0.1 to 0.25 to be an acceptable range for the -best_hit_overhang parameter and 0.05 to 0.25 to be an acceptable range for the -best_hit_score_edge parameter."
But i have several questions:
HSP A and HSP B. Obviously they must related to the same query. But must both HSPs represent only two different hits? I mean, may HSP A and HSP B originate from the same hit sequence?
E-value of HSP A must be greater those of HSP B, and opposite situation for bit score (bit score of HSP A< bit score HSP B). Right? I'm wondering what is about length of HSP A and HSP B?
I can not understand the following sentence from the documentation: "For each HSP A that is filtered, there exists another HSP B such that the query region of HSP A extends each end of the query region of HSP B by at most H times the length of the query region for B." Is there any other explanation (may be graphical)?