Is this our future? I've noticed similar gains than in the paper when I have e.g. split a large reference database into chunks that just fit on memory.. easy 3-5x improvement in speed simply from eliminating caching..
Well, it's more about moving from CPU-centric systems (von Neumann architecture) into "Fabric Attached Memory"-centric systems (in order to eliminate I/O)..
Edit. btw some time ago I learned that there's little point in creating RAM-disks in GNU/Linux, as the kernel keeps your stuff nicely in RAM anyway..
Oracle did a version with blast many years ago. AFAIK it was not widely used though (perhaps because of the expensive oracle licenses).
Oracle10g BLAST Functions
A version of BLAST, which is very similar to NCBI BLAST 2.0, has been
implemented in the database using table functions. This enables users
to perform BLAST queries against data that is held directly inside an
Oracle database. Because the algorithms are implemented as table
functions, parallel computation is intrinsically supported.
Not sure if any NGS algorithms have been similarly implemented. Have not kept up with Oracle.
I had a different experience when I was testing /dev/shm vs. tmpfs vs. ramfs vs. just having my DB initially on a RAID5 array. In essence, there was no noticeable difference in performance. All it took was splitting my DB into chunks that fit on RAM one at a time, and then running stuff sequentially. This is because Linux loaded the DB into RAM anyway and kept it there..
This is a good news release to understand Memory Driven Computing. This can indeed be a very different way to do computing.
Article mentions changes to kallisto code needed to make it MDC compatible. I am not sure if we have anyone associated with that project who could take a look at the modifications to see how extensive/feasible they were (since similar changes would likely be needed for other code).
In the case of Kallisto, seems like their modifications were quite minor:
In the pseudoalignment use case, we recognized that kallisto uses a
static hash map. This hash map allows fast access to reference genome
information using a k-mer as the key. Our analysis of this process
showed that lookup operations in this hash table consume most of the
time in the pseudoalignment phase. Hash maps use slots to store
the data. With fewer slots (higher load factor), there are
collisions and additional comparisons are needed to find a
specific element. With a lower load factor, the number of
collisions is reduced ( Extended Data Fig. 7a) at the cost of
increased memory consumption. Clearly, with MDC we can make use of
the abundance of memory to overcome this bottleneck. By decreasing the
load factor of the hash map and hence increasing the hash map
from 2 GB to 50 GB we removed most of the collisions
(Extended DataFig. 7a). Furthermore, utilizing mmap and the LFS to
load the hash map – despite the fact that the index file was increased
to 50 GB – was still faster than loading the hash map within the
original kallisto (Extended Data Fig. 7b). Since the index file is on
LFS, it can be shared between multiple instances.
Is this really a new paradigm, or is this a sign that hardware has but we're still writing code like our nodes are from 5+ years ago? Or are some of us just spoiled and want the software to catch up to us?
I was always told that "early on" memory was the main reason people built clusters and went down the MPI/Infiniband route. You got around the RAM/node limit by stitching nodes together with some MPI implementation and an interconnect. You can always wait a little longer for a job to finish, but if your matrix won't fit in memory it'll never get done. More or less I was taught to think about RAM first, and parallelization second. RAM/core was more important than total number of cores, especially since so many problems had upper limits on the number of cores you could use at once.
People have been using RAM disks since ancient times.
Well, it's more about moving from CPU-centric systems (von Neumann architecture) into "Fabric Attached Memory"-centric systems (in order to eliminate I/O)..
Edit. btw some time ago I learned that there's little point in creating RAM-disks in GNU/Linux, as the kernel keeps your stuff nicely in RAM anyway..
Oracle did a version with
blast
many years ago. AFAIK it was not widely used though (perhaps because of the expensive oracle licenses).Not sure if any NGS algorithms have been similarly implemented. Have not kept up with Oracle.
It should, but I've noticed much better performance when I manually put things into /dev/shm/.
I had a different experience when I was testing /dev/shm vs. tmpfs vs. ramfs vs. just having my DB initially on a RAID5 array. In essence, there was no noticeable difference in performance. All it took was splitting my DB into chunks that fit on RAM one at a time, and then running stuff sequentially. This is because Linux loaded the DB into RAM anyway and kept it there..
This is a good news release to understand Memory Driven Computing. This can indeed be a very different way to do computing.
Article mentions changes to
kallisto
code needed to make it MDC compatible. I am not sure if we have anyone associated with that project who could take a look at the modifications to see how extensive/feasible they were (since similar changes would likely be needed for other code).In the case of Kallisto, seems like their modifications were quite minor:
Is this really a new paradigm, or is this a sign that hardware has but we're still writing code like our nodes are from 5+ years ago? Or are some of us just spoiled and want the software to catch up to us?
I was always told that "early on" memory was the main reason people built clusters and went down the MPI/Infiniband route. You got around the RAM/node limit by stitching nodes together with some MPI implementation and an interconnect. You can always wait a little longer for a job to finish, but if your matrix won't fit in memory it'll never get done. More or less I was taught to think about RAM first, and parallelization second. RAM/core was more important than total number of cores, especially since so many problems had upper limits on the number of cores you could use at once.
Are we headed back in that direction?