Mapping vs Alignment
1
1
Entering edit mode
17 months ago

I've been reading up on reference mapping vs reference alignment and some are saying that mapping is where you find the general region of a sequence while with alignment all bases need to match...but then I am also seeing some papers using these terms interchangeably, even directly stating mapping is also referred to as alignment.

Is there actually a difference? Or are these two techniques the same? Each time I feel I am getting closer to understanding the difference, I read something else and they seem the same again.

I also saw somewhere that mapping is a part of alignment. If someone could kindly clarify or point me towards a reputable paper from a recent year noting the difference (if any) this would be much appreciated.

sequencing • 2.7k views
ADD COMMENT
6
Entering edit mode
17 months ago
Rob 6.9k

Unfortunately, the terminology has been largely confused and used in an imprecise way in the literature. Many people use the terms mapping and alignment interchangeably, and contemporary use of either term itself will not convey a fine-grained understanding to the reader. When you use these terms in your own documentation or writing, just be precise about what exactly was carried out or what your algorithm does.

That being said, these have conventionally been distinct terms representing distinct ideas. On of the first clear definitions I found was due to a 2011 presentation by Heng Li : enter image description here

That is, the distinction is that mapping equates, essentially to localizing a read — finding out where it arises from, while aligning implies drawing a detailed correspondence between each nucleotide of the read with that of the reference (or mapping them to insertions or deletions). To this end, when people use the terms loosely, they will often (though not always) do so in an asymmetric way that is compatible with this. That is, it's common for people to say "mapping" when they mean alignment, but much less common for people to say "alignment" when they mean mapping. This also generally follows how these steps are carried out in many algorithms, where a read is first approximately localized (mapped), and then a detailed alignment is carried out at this locus via a more computationally intensive dynamic programming approach.

This separation of phases allows for simultaneous development of new and better approaches to both of these problems in modular ways that can be combined. For example, the recent mapquik paper describes improved algorithms for seeding and chaining to efficiently localize long-reads, while papers like the recent biwfa paper describe advancements in algorithms for computing nucleotide-to-nucleotide alignments.

Nonetheless, while these distinctions have been drawn somewhat clearly in the past, and these terms were intended to have distinct meanings, contemporary usage is not always consistent and these terms are often used interchangeably. So, when you discuss specific algorithms, tools, or workflows, just be sure to be explicit about what is being computed and what the output is, rather than trying to rely on the terms mapping or alignment to do that work for you.

ADD COMMENT
0
Entering edit mode

Thank you! Yes, I came across this presentation in researching this topic, but wanted to get a little more detail. Are there cases in which you would do mapping and not alignment or alignment and not mapping? I saw somewhere that mapping was a part of alignment, so generally do you always map when doing alignment, but you don't always have to align when doing mapping? Apologies if this seems like a simple question, I just want to be sure I am understanding correctly the distinction.

For something like reference-guided de novo assembly, does this usually involve both mapping and alignment?

ADD REPLY
1
Entering edit mode

No need to apologize. So there are cases where it makes sense to do "mapping" but not to bother with full alignment. This is common, for example, in some tools that deal with "census sequencing" to e.g. infer the presence/absence or abundance of different molecules. This is the case in assays like RNA-seq (both bulk and single-cell) when you are primarily interested in abundance estimation and not novel transcript discovery, as well is in assays like metagenomic sequencing when you want to do e.g. taxonomic abundance estimation or taxonomic read assignment. For these types of tasks — and others — it often suffices to know from where a read may have been sampled, but it may be unnecessary to know exactly how each of its nucleotides corresponds to the reference. In those cases, one may adopt tools that use mapping (or other "matching based" methods, like Kraken) instead of alignment.

ADD REPLY
1
Entering edit mode

This is very helpful, I appreciate your response

ADD REPLY

Login before adding your answer.

Traffic: 2053 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6