I am a completely rookie on Bioinformatics, so please bear with me and use simple language (I am a computer scientist) :-)
How can we use k-mers to find out if a gene is similar to our query string?
For example: We have a reference gene r = ACAAGTC
, and a query string q = CATGT
. For sequence alignment we could get the two possible solutions: (see photo on link)
But if we use k-mer with k=2 (reason for 2 is that I tried using k=3 and got 1 match) we could split q into the set = {CA, AT, TG, GT} we can get 2 matches.
I am confused on how to use k-mer for queries. I am guessing that it would also need to know where in the reference string it is found, since order matter. But most importantly, Why can we use K-mers? That's maybe my biggest question mark.
If you intend to identify genes then using this toy example is inappropriate. One would need to use much longer k-mers. This paper may help provide some background reference materials. One more tool.
Separate to my answer below, I would contend that the right hand alignment is all that realistic (no matter the method). Unless you were artificially penalising gaps very low (which is the opposite of the biological reality) I'd be very suprised if you would come across 2 offset gaps like that.