I am aligning nanopore reads to the C. Elegans genome to identify coverage across the genome.
There is a region in the C. Elegans genome which has a very high number of reads matching (an order of magnitude higher than others). I think this is because its a repeat region and has lots of homopolymers. So reads from this region have a lot of errors and their alignment here is ambiguous. As a result a single read blasted to this annoying region ends up with multiple hits because blast can't figure out the best alignment.
Can you suggest any strategies to work around this? My current thought is to prevent BLAST finding multiple hits for a single read in the same region. Is this a good strategy and what is the best way to implement this?
Thanks for your time.
I have no experience with Nanopore but I'm wondering whether blast is the right tool for read mapping in general. Blast is tuned to find regions of similarity between possibly distant species, so it expects to find a sequence aligned at mulriple places and I think it doesn't have the concept of 'mapping quality' (i.e. probability that the mapping is wrong as opposed to alignment score or e-value). I would suggest to try bwa mem which is designed to work with long reads, possibly split across large gaps.