Suppose I have the following read:
AAGGAAGGAAGGAAGGACTTCCTT
I want to align it to one of these two reference sequences:
AAGGAAGGAAGGAAGGACAAGGAA
AAGGAAGGAAGGAAGGCCTTCCTT
Clearly, the second reference sequence is where the read belongs: there's only one mismatch. The maximum mappable prefix, though, is in the first read (AAGGAAGGAAGGAAGGA
). The remainder of the read would then be aligned correctly.
The STAR paper says "If the MMP search does not reach the end of a read because of the presence of one or more mismatches, the MMPs will serve as anchors in the genome that can be extended, allowing for alignments with mismatches." Does that mean STAR would search over full alignments implied by each seed and thus align the entire read correctly? If so, does this step of STAR tolerate indels?