Why There Are A Lot Of Mq0 Reads In Some Particular Regions?
3
0
Entering edit mode
11.9 years ago
lyz10302012 ▴ 470

Can anyone explain why there are a lot of MQ0 reads in some particular regions (almost all reads are MQ0)? What is character of these regions? How to call snps, indels in these regions, consider the MQ0 reads or just ignore them?

variant calling • 6.5k views
ADD COMMENT
4
Entering edit mode
11.9 years ago

I just happen to know that by MQ you mean mapping quality 0, this is not obvious from the question.

The important thing to note is that the values for mapping quality are not standardized thus they need to be discussed in the context of the mapping software) (see comments for corrections). You seem to be referring to the BWA behavior that assigns a mapping quality 0 to a read that maps equally well to multiple locations in the genome.

In that case the explanation for your question is that this is caused by having identical (duplicated) regions in the genome.

ADD COMMENT
1
Entering edit mode

Mapping quality by itself is clearly defined. The computation of mapping quality depends on the algorithm, but every aligner is supposed to output mapping quality in the same meaning. Mapping quality from different mappers ARE comparable. That is why most of time we can switch the mapper without greatly impacting downstream analysis.

ADD REPLY
1
Entering edit mode

perhaps this has changed,I was basing my comment on empirical experiences where I have noticed that most aligners do not produce a continuous range of mapping qualities. Instead certain values appear far more frequently. I always thought that this was due to certain assumptions about the way this score is computed in each software, assumptions that may not be well documented - thus would be unexpected to be same between software packages.

ADD REPLY
1
Entering edit mode

No, mapping quality is always the way it is. Whether the mapping quality is continuous or not is entirely irrelevant. In fact, base quality reported by phred is never continuous, either. Note that there are multiple ways to compute mapping quality (as well as base quality). For example, here are three ways: 1) give Q10 to every alignment; 2) give Q0 to exact repeat and Q20 to everything else; 3) give Q0 to exact repeat, Q40 to an alignment whose 2nd best alignment contains at least two more mismatches and Q10 to everything else. All these ways give discrete mapping quality and all are correct approximately. 2) and 3) are also comparable: we will find the vast majority of non-Q0 alignments are the same between 2 and 3.

ADD REPLY
1
Entering edit mode

I see, thanks for the clarifications. I will amend some of my previous posts and link to this.

ADD REPLY
0
Entering edit mode
11.9 years ago
lyz10302012 ▴ 470

Thanks Istvan, I just mean BWA. I guess so , but it seems difficult to calling snp, indels in such region, because there are a very small fraction of supporting reads if we ignore MQ0 reads.

ADD COMMENT
0
Entering edit mode
11.9 years ago

but it seems difficult to calling snp, indels in such region,

Yup. You would need longer reads (or paired end reads where one end is in unambiguous sequence). That's just how genomes are, and there's not a lot you can do. You would just have to report that due to the underlying gnome structure, you have poor data in that region.

ADD COMMENT

Login before adding your answer.

Traffic: 2529 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6