VG Giraffe multi-mapped reads and definition for MAPQ score
0
0
Entering edit mode
16 months ago

Hi,

I am recently playing with VG giraffe. The tool is amazing. I want to say thank you for the developers of VG team. I do have 2 little questions regarding the alignment.

  1. What is the definition of MAPQ (mapping quality)? I tried to search within the GitHub wiki, but didn't find anything. It seems the score ranges from 0-60.

  2. How is multi-mapped defined in alignment with the genome graph?

Here is a tricky case I came up with. The read is 100bp long with the sequence of [.....ATCG......].

The genome graph is as followed:

...A -> GC -> G...

-> TG ->

I believe the read can be mapped to either segment GC or TG with identical alignment scores (sequence identity/similarity). Although the alignment path was not the same, I don't think it should be a multi-mapped case. That's why I am sort of interested in how the multi-mapped is handled in alignment in the genome graph.

Any hind is much appreciated! Thank you.

vg • 1.8k views
ADD COMMENT
2
Entering edit mode

1: Mapping quality is a standard metric that quantifies the probability that the indicated mapping is incorrect (I believe originally defined in Li, Ruan, & Durbin 2008 Genome Research). It's also used in the SAM/BAM/CRAM file formats. The formula is 10^(-MQ/10) chance of error.

2: There's a conventional distinction between "mapping" and "alignment" accuracy, where mapping refers to finding the right general location in the genome, and alignment refers to getting the bases from the read matched up with the correct bases in the genome. You can have a correct mapping without having a fully correct alignment. The boundary between alignment and mapping is not very firm, but my sense is that usually mapping refers to distances that are larger than the read and alignment refers to distances that are shorter than the read. Mapping quality only refers to mapping accuracy, so your example would be considered unambiguous for the purposes of computing mapping quality.

You bring up an important subtlety in graph mapping: alignment accuracy encompasses not only matching the right bases together but also the right path through the graph. The alignment path can be ambiguous even if the mapping is unambiguous. To my knowledge, the only real attempt to address this is vg mpmap, which uses multipath alignments, which align to all of the nearby paths in the graph instead of just one.

ADD REPLY
0
Entering edit mode

Thank you for your detailed reply. It is very helpful to me. Just wondering, is it possible to identify multi-mapped alignment though setting the threshold for MAPQ?

ADD REPLY
0
Entering edit mode

There are occasionally reasons to give a map poor mapping quality without finding a high-scoring alternate alignment (for example, if it has very low base-calling quality), but the vast, vast majority of the time, low MAPQ does indicate that there are multiple plausible mapping locations.

ADD REPLY
0
Entering edit mode

Good to know. Thank you so much!!

ADD REPLY
0
Entering edit mode

Can I ask another question here? Does VG giraffe take advantage of WALKs and PATHs in the GFA? Another way to ask the question is that, If I remove some or all WALK/PATH, will VG giraffe produce the same result? Thank you!

ADD REPLY
1
Entering edit mode

VG giraffe uses the walks heavily. A large part of its efficiency comes from focusing mostly on the annotated walks, as opposed to all possible walks through the graph. That said, there are diminishing returns on adding new walks, so sometimes a better accuracy/efficiency tradeoff can be achieved by sub-sampling walks.

ADD REPLY
0
Entering edit mode

Great, thank you!

ADD REPLY

Login before adding your answer.

Traffic: 2606 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6