I would like to align PacBio reads to contigs with an end-gap-free, semi-global approach. The exact kind of alignment I'd like is described here as an 'overlap alignment'. The alignment must keep going until it reaches the end of either the read or the contig.
For example, these alignments are all okay:
AAAAA AAAAAAAAAAA
||||| |||||||
BBBBBBBBB BBBBBBB
AAAAAAAA AAAAAAAA
||||| |||||
BBBBBBBBB BBBBBBBBB
But this is not okay because the alignment terminates before the sequences do:
AAAAAAA
|||||
BBBBBBBBBBB
I was hoping for an efficient tool (I need to do a lot of these) that handles error-prone long reads well. BLASR and BWA-MEM do local alignment and therefore won't work for me. GraphMap claims to do semi-global alignment and is the best I've found so far. But it too often gives alignments that terminate before an end of sequence. Are there other appropriate tools I haven't found?
I wasn't familiar with exonerate, but it looks good! It has an alignment model (affine:overlap) which is exactly the alignment type that I need. I'll check it out to see how well it performs with long reads and long reference sequences.
Unfortunately, I see that exonerate cannot use its heuristics to speed up the alignment process when in affine:overlap mode. This means that it finds an optimal result, which is far too slow. It took about 2 minutes to align a single read to a single contig.