I was just wondering whether anyone has had any experience using the Linux kernel module zRAM with assembly (or other tasks)? The basic idea of zram is that it compresses RAM so that more data can be stuffed into the same hardware, without having to resort to comparatively slow operation of swapping to disc.
Given that having lots of RAM is a requirement limits some assemblies, can zram be used to make assembly faster? Are de-bruijn graphs generally compressible? Do de novo assemblers already do compression on the graph anyway?
Thanks for the answer. You're right that it pretends some RAM is a disk, but I think that is beside the point. Because it is compressed, and it swaps to the zram disk before the hdd disk, the assembler can fit more data into the same RAM.
That doesn't entirely remedy the problem because the graph can't be compressed too much, I guess much less than an order of magnitude. Still, zram has no big negative consequences that I know of, so why not use it?
It may help, and I would expect that at the very least it won't hurt. I expect it would depend on what data structures the assembler uses and how it stores them in memory. I highly doubt that any de novo assembler implements its own in-memory data compression.
You should definitely try it out. I run zRAM on all the linux devices that I control. The main benefit that I see is that when my process uses up all the system's memory, zRAM gives me a "grace period" where the computer is still responsive enough that I can kill the process before it starts swapping to disk, whereas without zRAM it would hit the on-disk swap and become completely unresponsive as soon as RAM fills up.
For SGA and fermi, data are sort of compressed by nature. That is partly why they use less memory. Of course you can compress the data a little further, but not much further and the performance hit would be significant.
You would only incur a performance hit if the system needed to swap, and nothing that zRAM can do will be slower than swapping to disk. So I don't think there would ever be a noticable performance hit relative to not having zRAM enabled, even when it does effectively nothing.
For de bruijn graph, please check the sparse kmer scheme used in SOAPdenovo2 (recently published in BMC GigaScience). It enables to assemble a human genome with only 60GB memory using DBG, utilizing the concept that, actually for a large portion of a genome, it's unique and could be represented by compressed data structure.
Thanks for the answer. You're right that it pretends some RAM is a disk, but I think that is beside the point. Because it is compressed, and it swaps to the zram disk before the hdd disk, the assembler can fit more data into the same RAM.
That doesn't entirely remedy the problem because the graph can't be compressed too much, I guess much less than an order of magnitude. Still, zram has no big negative consequences that I know of, so why not use it?