How to represent arbitrarily length kmer?
1
0
Entering edit mode
3 months ago
dwpeng ▴ 110

hello there. I know one base can be represented 2-bit, that means a 64-bits number maximal can represent a 32-mer. But if I want to represent a longer kmer, is there any more memory-efficient method.

kmer • 597 views
ADD COMMENT
4
Entering edit mode
3 months ago
Rob 6.9k

Yes! The same trick that you can use with k <= 32 also works with longer k. The difference is just that the result doesn’t fit in a single machine word any longer, and so some of the bit-twiddling operations become a bit more complex. For example, most modern hardware supports 128-bit integers (and many modern languages, like Rust) expose this type natively. You can pack k <= 64 bits into a 128-bit integer.

Beyond that, or more generally, one typically uses an array of bytes of the minimum required size. Each byte is 8 bits and can hold 4, 2-bit encoded nucleotides. So if you need a 37-mer, say you could pack it into 10 bytes. Many tools have this type of capability. For example, you can take a look in the source code of KMC, or in the Kmer class from the Cuttlefish tool developed in my lab (e.g. here).

ADD COMMENT
1
Entering edit mode

Hi, rob, thanks to your reply and I wrote a demo in Rust. Repo here: https://github.com/dwpeng/kmer

ADD REPLY
0
Entering edit mode

Thank you for the detailed answer. I will read KMC' and cuttlefish's source code.

ADD REPLY

Login before adding your answer.

Traffic: 1997 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6