hello there. I know one base can be represented 2-bit, that means a 64-bits number maximal can represent a 32-mer. But if I want to represent a longer kmer, is there any more memory-efficient method.
hello there. I know one base can be represented 2-bit, that means a 64-bits number maximal can represent a 32-mer. But if I want to represent a longer kmer, is there any more memory-efficient method.
Yes! The same trick that you can use with k <= 32 also works with longer k. The difference is just that the result doesn’t fit in a single machine word any longer, and so some of the bit-twiddling operations become a bit more complex. For example, most modern hardware supports 128-bit integers (and many modern languages, like Rust) expose this type natively. You can pack k <= 64 bits into a 128-bit integer.
Beyond that, or more generally, one typically uses an array of bytes of the minimum required size. Each byte is 8 bits and can hold 4, 2-bit encoded nucleotides. So if you need a 37-mer, say you could pack it into 10 bytes. Many tools have this type of capability. For example, you can take a look in the source code of KMC, or in the Kmer class from the Cuttlefish tool developed in my lab (e.g. here).
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hi, rob, thanks to your reply and I wrote a demo in Rust. Repo here: https://github.com/dwpeng/kmer
Thank you for the detailed answer. I will read KMC' and cuttlefish's source code.