Hi,
I have a list of kmers, between 8-12 nt in length, and I would like to align these to a larger sequence returning all ungapped matches with at most 2 mismatches. I would like search to be exhaustive i.e. I do not want to miss anything. I wrote a python script to compute hamming distance for all substrings of my reference to the query, but it is too slow for many (1000s) queries on a reference of ~100,000nt.
What program would you recommend that does this and runs rather quickly. I have looked into Bowtie2, but I am unsure if it was designed to work with such short query sequences.
Thanks for the feedback.
Your solution sounds like the best approach. Incidentally, you can generate mutant kmers with BBDuk, like this:
The number of mutations is specified as hdist (hamming distance).