Question

What happens during seed stitching when STAR initially gets the wrong MMP?

0

Entering edit mode

7.5 years ago

eric.kern13 ▴ 240

Suppose I have the following read:

AAGGAAGGAAGGAAGGACTTCCTT

I want to align it to one of these two reference sequences:

AAGGAAGGAAGGAAGGACAAGGAA
AAGGAAGGAAGGAAGGCCTTCCTT

Clearly, the second reference sequence is where the read belongs: there's only one mismatch. The maximum mappable prefix, though, is in the first read (AAGGAAGGAAGGAAGGA). The remainder of the read would then be aligned correctly.

The STAR paper says "If the MMP search does not reach the end of a read because of the presence of one or more mismatches, the MMPs will serve as anchors in the genome that can be extended, allowing for alignments with mismatches." Does that mean STAR would search over full alignments implied by each seed and thus align the entire read correctly? If so, does this step of STAR tolerate indels?

rna-seq STAR • 2.2k views

ADD COMMENT • link updated 7.5 years ago by Santosh Anand 5.8k • written 7.5 years ago by eric.kern13 ▴ 240

score 0 · Answer 1 · 2017-06-05

"Given a read sequence R, read location i and a reference genome sequence G, the MMP(R,i,G) is defined as the longest substring (Ri, Ri+ 1, … , Ri+MML− 1) that matches exactly one or more substrings of G"

If I understood correctly, the MMP is for per genome. So STAR should get different MMPs for each of them, then extend it according to GAP/MISMATCH penalty. Finally, the best scoring match should be reported, which is the second one.