Best match Blast
0
0
Entering edit mode
3.3 years ago
aka ▴ 10

Hi!!

I have results from a tblastx but I would like to keep aonly the best result.

I have this extract:

Query= AT3G23790.1 | Symbols: AAE16 | AMP-dependent synthetase and ligase
family protein | chr3:8575268-8581001 FORWARD LENGTH=2169

Length=2169
                                                                      Score     E
Sequences producing significant alignments:                          (Bits)  Value  N

Qrob_P0041780.2 528                                                   193     5e-49  1
Qrob_P0179090.2 2106                                                  71.2    1e-38  5
Qrob_P0041760.2 606                                                   150     6e-36  1
Qrob_P0279400.2 1896                                                  72.1    4e-34  4
Qrob_P0041770.2 387                                                   84.5    1e-30  2
Qrob_P0278330.2 1632                                                  85.4    2e-25  3
Qrob_P0388410.2 1629                                                  78.5    3e-21  3

What I understand is that I have the best result with the fisrt row Qrob_P0041780.2 528, because I have an high Score and a good E_value but is it the good things? Or it is not the first row?

Some query have an E-value of 0.0, I read some comments and it means that it's a really best score, is it right?

Thank you for your help.

Aka

tblastx • 1.7k views
ADD COMMENT
0
Entering edit mode

Hmmm

I would say only considering the E-value cut-off is not a very good measure. Rather, I would consider multiple criteria such as Percent Identity, E-value, query coverage, bits score, etc.

Because sometimes if the 1% of a query is matching with the subject with 100% identity, then its E-value would be close to zero (very significant) but if you look at the query coverage it will be 1%. Would that be the best hit for your query? Instead, if I have another hit where 90% query is covered with 80% identity, I would go for this query as the best match.

There is an option in a BLAST program -max_target_seqs, did you try it?

ADD REPLY
0
Entering edit mode

Thank you for your help, I understand I think even if I keep the best E_value and score it's not enough. No I don't tried, it allow to select the best match ?

I found this but I don't really understand:

    -max_target_seqs <Integer, >=1>
   Maximum number of aligned sequences to keep 
   (value of 5 or more is recommended)
   Default = `500'
    * Incompatible with:  num_descriptions, num_alignments
ADD REPLY
0
Entering edit mode

I think it is very straightforward, if you select this option and provide a value of 1 (-max_target_seqs 1) it will return 1 best hit of the alignment. But it is recommended you should give a value of >5 and the default value is 500.

I would consider multiple criteria such as Percent Identity, E-value, query coverage, bits score, etc.

So if we go by this suggestion we have to process the data with the same strategy as we have discussed in the previous comment. I mean we need to sort the best hit based on multiple criteria and select it.

ADD REPLY
0
Entering edit mode

Thank you, maybe it's a dumb question but how I can process my data with all this parameters? is there a way to do this sorting in the blast ?

ADD REPLY

Login before adding your answer.

Traffic: 2548 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6