Semi global (end gap free) alignment of long reads?
2
2
Entering edit mode
8.6 years ago
rrwick ▴ 30

I would like to align PacBio reads to contigs with an end-gap-free, semi-global approach. The exact kind of alignment I'd like is described here as an 'overlap alignment'. The alignment must keep going until it reaches the end of either the read or the contig.

For example, these alignments are all okay:

  AAAAA        AAAAAAAAAAA
  |||||          |||||||
BBBBBBBBB        BBBBBBB

    AAAAAAAA     AAAAAAAA
    |||||           |||||
BBBBBBBBB           BBBBBBBBB

But this is not okay because the alignment terminates before the sequences do:

  AAAAAAA
  |||||
BBBBBBBBBBB

I was hoping for an efficient tool (I need to do a lot of these) that handles error-prone long reads well. BLASR and BWA-MEM do local alignment and therefore won't work for me. GraphMap claims to do semi-global alignment and is the best I've found so far. But it too often gives alignments that terminate before an end of sequence. Are there other appropriate tools I haven't found?

alignment PacBio • 2.9k views
ADD COMMENT
2
Entering edit mode
8.6 years ago

bowtie2 has a --end-to-end but I don't know how it copes with long and error prone reads. vmatch also has a -complete option (-complete: specify that query sequences must match completely) and it's very flexible, you need a license, which is free for academic use. Finally, exonerate is also very flexible.

I guess the choice depends also on the size of the genome and number of reads you have to align.

ADD COMMENT
0
Entering edit mode

I wasn't familiar with exonerate, but it looks good! It has an alignment model (affine:overlap) which is exactly the alignment type that I need. I'll check it out to see how well it performs with long reads and long reference sequences.

ADD REPLY
0
Entering edit mode

Unfortunately, I see that exonerate cannot use its heuristics to speed up the alignment process when in affine:overlap mode. This means that it finds an optimal result, which is far too slow. It took about 2 minutes to align a single read to a single contig.

ADD REPLY
1
Entering edit mode
8.6 years ago
rrwick ▴ 30

Since I asked this question, the GraphMap developers have created a new branch which help to make more of their alignments semi-global in the way I require. My current approach is therefore using GraphMap and then using my own script to filter out alignments that are only local.

ADD COMMENT

Login before adding your answer.

Traffic: 932 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6