Question

BLAST: How much of the query is aligned?

0

Entering edit mode

10.7 years ago

bongbang ▴ 90

At first, I thought this question would answered by the "qcovs" field, but a glance at the results proved that that isn't the case. To begin with, each qcovs value relates not to the original query, but a smaller query partitioned therefrom. And I don't even know what this number actually means for those mini-queries. "Query Coverage Per Subject" is what the manual says, but apparently they use it in a different sense from what I would normally understand.

Second, "length" is supposed to be "length of alignment," but I'm now sure what that means, either. It's neither the length of the mini-query (qend-qstart+1) nor that of the corresponding subject, although there's a strong correlation between the three.

My purpose is to see whether the genome assembler succeeded in putting together a conserved gene of interest. As a measure of how well of each original (unpartioned) gene query is assembled, I'm think of either:

max([set of "nident" from all mini-queries based on the same original query])/original query length

or

max([set of "length" from all mini-queries based on the same original query])/original query length

Which one, if any, is the right approach? Please feel free to suggest your own, although I would appreciate an explanation of what I got wrong. An elucidation of "qcov" and "length" would be nice, too. Thank you.

blast • 5.0k views

ADD COMMENT • link updated 3.5 years ago by Ram 45k • written 10.7 years ago by bongbang ▴ 90

0

Entering edit mode

Please check my recent comment in another thread.

C: BLAST definition and difference between 'qcovs' and 'qcovhsp'

ADD REPLY • link 10.7 years ago by Siva ★ 1.9k

Ram · Answer 1 · 2014-11-09

It is indeed surprisingly difficult to find definitions even for seemingly simple concepts. I wanted to double check what I am going to say but couldn't find any source of information. Here it goes anyway, I believe that alignment length refers to the number of matched or mismatched bases of the query.

In general it is difficult to characterize with a single measure how similar sequences are. Works OK when these are very similar and kind of break down as the sequences become more dissimilar.

Ram · Answer 2 · 2014-11-21

0

Entering edit mode

10.7 years ago

Siva ★ 1.9k

Are you sure qcovs is not giving the query coverage per subject? Because there is also another option qcovhsp which gives Query Coverage Per HSP

If you do want to calculate the query coverage yourself take in to account that there could be overlaps.

ADD COMMENT • link updated 3.5 years ago by Ram 45k • written 10.7 years ago by Siva ★ 1.9k