I have a short sequence (34bp) that I would like to align against the mouse genome. Probably a bit of an odd question, so let me explain in more detail...
At the moment, I can predict regions where this sequence could be present (based on experimental data from our lab). So I take the reference genome from these regions, use ClustalW2 and align my 34bp sequence. It aligns where I expected it to. The alignment is poor: there are mismatches and gaps but this is to be expected as the purpose of this is to troubleshoot a problem in our targeted resequencing.
Now I've exhausted the regions we know/think this sequence occurs in and would like to generate a list of other positions where this sequence could also be found. The alignment doesn't need to be perfect, I'm after an indication of where these sequences are found.
Of course, ClustalW2 isn't good for this alignment as the reference is simply too large. What I'm looking for is a tool that can performed the gapped and mismatched alignment I'm getting from ClustalW2 but across the whole genome.
Is there such a tool and does anyone have any experiences with doing something similar?
Wonderful suggestion. I was using the default parameters which weren't dealing with the mismatches or gaps very well. A bit of tweaking has given me a list of positions that almost perfectly predict the regions where said sequencing problem has occurred. A little more tweaking and I should have a really good list of potentially problematic regions.