BLAST: How much of the query is aligned?
2
0
Entering edit mode
10.1 years ago
bongbang ▴ 90

At first, I thought this question would answered by the "qcovs" field, but a glance at the results proved that that isn't the case. To begin with, each qcovs value relates not to the original query, but a smaller query partitioned therefrom. And I don't even know what this number actually means for those mini-queries. "Query Coverage Per Subject" is what the manual says, but apparently they use it in a different sense from what I would normally understand.

Second, "length" is supposed to be "length of alignment," but I'm now sure what that means, either. It's neither the length of the mini-query (qend-qstart+1) nor that of the corresponding subject, although there's a strong correlation between the three.

My purpose is to see whether the genome assembler succeeded in putting together a conserved gene of interest. As a measure of how well of each original (unpartioned) gene query is assembled, I'm think of either:

max([set of "nident" from all mini-queries based on the same original query])/original query length

or

max([set of "length" from all mini-queries based on the same original query])/original query length

Which one, if any, is the right approach? Please feel free to suggest your own, although I would appreciate an explanation of what I got wrong. An elucidation of "qcov" and "length" would be nice, too. Thank you.

blast • 4.7k views
ADD COMMENT
0
Entering edit mode

Please check my recent comment in another thread.

C: BLAST definition and difference between 'qcovs' and 'qcovhsp'

ADD REPLY
1
Entering edit mode
10.1 years ago

It is indeed surprisingly difficult to find definitions even for seemingly simple concepts. I wanted to double check what I am going to say but couldn't find any source of information. Here it goes anyway, I believe that alignment length refers to the number of matched or mismatched bases of the query.

In general it is difficult to characterize with a single measure how similar sequences are. Works OK when these are very similar and kind of break down as the sequences become more dissimilar.

ADD COMMENT
0
Entering edit mode
10.1 years ago
Siva ★ 1.9k

Are you sure qcovs is not giving the query coverage per subject? Because there is also another option qcovhsp which gives Query Coverage Per HSP

If you do want to calculate the query coverage yourself take in to account that there could be overlaps.

ADD COMMENT

Login before adding your answer.

Traffic: 1673 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6