Question

BLAST: One read matches same region multiple times

0

Entering edit mode

8.6 years ago

godeludanu ▴ 30

I am aligning nanopore reads to the C. Elegans genome to identify coverage across the genome.

There is a region in the C. Elegans genome which has a very high number of reads matching (an order of magnitude higher than others). I think this is because its a repeat region and has lots of homopolymers. So reads from this region have a lot of errors and their alignment here is ambiguous. As a result a single read blasted to this annoying region ends up with multiple hits because blast can't figure out the best alignment.

Can you suggest any strategies to work around this? My current thought is to prevent BLAST finding multiple hits for a single read in the same region. Is this a good strategy and what is the best way to implement this?

Thanks for your time.

blast nanopore • 2.4k views

ADD COMMENT • link updated 8.6 years ago by WouterDeCoster 47k • written 8.6 years ago by godeludanu ▴ 30

0

Entering edit mode

I have no experience with Nanopore but I'm wondering whether blast is the right tool for read mapping in general. Blast is tuned to find regions of similarity between possibly distant species, so it expects to find a sequence aligned at mulriple places and I think it doesn't have the concept of 'mapping quality' (i.e. probability that the mapping is wrong as opposed to alignment score or e-value). I would suggest to try bwa mem which is designed to work with long reads, possibly split across large gaps.

ADD REPLY • link 8.6 years ago by dariober 15k

score 2 · Answer 1 · 2016-03-29

2

Entering edit mode

8.6 years ago

abascalfederico ★ 1.2k

There is no simple solution for repetitive regions. If you you are not interested in them, why don't you mask them from the genome? You can mask according to repeatMasker, to trf (tandem repeat finder) and/or to dust

HTH

ADD COMMENT • link 8.6 years ago by abascalfederico ★ 1.2k

0

Entering edit mode

While this strategy can work it sounds like @godeludanu is interested in (finding and) keeping the "best" alignment in this region. I don't know long the reads are in this case but trying a different aligner (e.g. LASTZ) may be a better option.

ADD REPLY • link 8.6 years ago by GenoMax 146k

score 2 · Answer 2 · 2016-03-29

My first thought is - how long are your reads? If they are several kilobases in length then use bwa-mem (choose the blasr option) or other mapping tools that are tuned to align PacBio and PacBio-like reads. These are optimized for long read length. You will have to filter your output file to find the optimal best alignment.

score 1 · Answer 3 · 2016-03-29

1

Entering edit mode

8.6 years ago

WouterDeCoster 47k

LAST is an aligner which is used more often for Nanopore sequencing. Perhaps using NanoOK could tell you a lot about your data: https://github.com/TGAC/NanoOK

ADD COMMENT • link 8.6 years ago by WouterDeCoster 47k