Query coverage high, E-value low, and identity low - Blastp results interpretation and article recommended?
1
0
Entering edit mode
14 days ago
Luana JBC • 0

Hi!

I was reading about three statistics from blastp results (query coverage, E-value, identity), and from the majority of the protein sequences found, it has high query coverage (>80%), low E-value (reaching even 1e-118), but low identity (<50%).

I found lessons on YouTube explaining those statistics (already read and saw NIH/NCBI tutorials), but my first question is if there is some publication telling us the numerical values for interpreting the proteins as homologues or not, or in reality, it has to be only the description of these statistics?

My second question is about those results found. Are they telling me that there was a high chance for those proteins being homologues, but only in a topological matter, because E-value is low and query coverage is high, and the function can be very different because of the low identity?

Thanks in advance!

sequence-analysis blastp protein • 977 views
ADD COMMENT
2
Entering edit mode
14 days ago
Mensur Dlakic ★ 29k

Either you are not clear what homologous sequences mean, or you are using some internal definition that makes sense to you. Let's pick an E-value=1e-20. Any two protein sequences that align with an E-value lower than that are pretty much guaranteed to be homologous, and that would be true even if we picked E-value=1e-10. Now, any two homologous sequences are likely to perform a broadly similar function, but not necessarily an identical function. We could have an isocitrate dehydrogenase and an alcohol dehydrogenase that perform chemically identical reactions (remove hydrogens), but on different substrates.

High coverage and low E-values, especially for sequences longer than 200-300 amino-acids, tells us that the two sequences are unquestionably related. Low sequence identity (<50%) could mean one of the two things: that they have functionally diverged to work on different substrates as I described above, or that they perform an identical function but their sequences have diverged because of large evolutionary distance between them. If you got that result between two related yeast species, it would be more likely that sequences are homologous but don't necessarily perform identical functions. If you got that result from a bacterial and a human sequence, it would be perfectly normal that they are <50% identical but could still perform the same functions. That would make them not only homologs, but also orthologs.

ADD COMMENT
0
Entering edit mode

Hi, Mensur! Thank you very much for your explanation.

Best regards!

ADD REPLY
0
Entering edit mode

Could you recommend an article or a book?

ADD REPLY
1
Entering edit mode

I didn't learn this by reading a single book or an article, so nothing like that comes to mind. Some random resources:

ADD REPLY

Login before adding your answer.

Traffic: 2460 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6