I would like to index all k-mers in a set of nucleotide sequences. I could use a generic string based hash function, but my experiments indicate that a function leveraging the fact that the k-mers are overlapping might be faster. Also, I'd like a k-mer and its reverse complement to hash to the same value.
To be precise, I am not looking for hash table implementations, or applications, just a fast function from k-mer to a hash value, so that the value for a k-mer and its reverse complement is the same.
I can invent my own function for this (I have, as a matter of fact), but surely, something like this exists already?
have you any solution for this? what's the hashing functions for DNA k-mers?? Thanks