Why Do Some Systems Biological Analyses Require So Much Shared Memory?
1
1
Entering edit mode
12.5 years ago
nate ▴ 10

Hi all, I'm new to systems biology and I'm having trouble understanding why so many types of simulations/analyses require access to all elements in shared memory. Would the message passing on a cluster computer be reasonably fast to achieve this? I'm being told that I need special hardware (like an SGI machine) to undertake simulations of biological networks, and I don't fully grasp the computational problem. Can anyone explain the requirements in simple terms for a newbie?

• 1.9k views
ADD COMMENT
3
Entering edit mode

which applications are you talking about exactly?

ADD REPLY
0
Entering edit mode

I think in this case this is not so important, it might apply to many different types of simulations, also molecular dynamics or climate models.

ADD REPLY
0
Entering edit mode

simulations are usually computationally intensive, because you want to simulate as many factors as possible. For example, simulating the docking of a drug on a protein is very intensive, because you need to take into account a lot of factors (the position of each atom, its charge, etc..). Can you specify better which simulations are you referring to?

ADD REPLY
0
Entering edit mode

If it wouldn't be shared memory then process will be very slow as hundreds of the process are running for a job and every time accessing it from cluster or stored memory is not a good idea !! It will eat lot of system resource for accessing the cluster. while shared memory would be off course faster.

ADD REPLY
0
Entering edit mode
11.2 years ago
Michael 55k

Let's consider an abstract simulation, a simulation consists of a state-space S (a gigantic matrix, |S|>>big number ) and a parallel algorithm A operating on S using m CPUs, e.g. each taking on one cell of S. Further, we assume that A is discrete/stepwise (per time interval of which there are t >> big number) and synchronized, meaning that it consists of discrete steps updating S and each of the m instances running A needs to have an up-to-date version of S to start its computation. Also, A leads to near 100% of S being updated in each step.

If nodes do not have shared memory, there is a large transfer penalty in the size of |S| per node over the network; even if every node only sends updated portions of S over the network, they still need to receive the new state-space in each step. As the simulation consists of a large number of steps, the total transfer volume is proportional to: |S| * m * t.

If one wants to improve this, each node could keep track on which other nodes need their updates and only send to those, the factor m could be reduced, however that is more complicated to implement and imposes restrictions on A.

In a shared memory environment, this penalty is virtually 0.

This gives some ideas about when it might be ok to use a cluster:

  • There is no state-space or S is very small (blast 1e10 fasta sequences, embarrassingly parallel)
  • S is guaranteed to be sparse (mostly 0)
  • The jobs do not need to be synchronized so often or not at all (only every n seconds)(real-time strategy games) or S can be interpolated if missed
  • Each step, only very few changes made to S
  • Interactions are local (each cell needs only knowledge of direct neighbors, e.g. 'game of life' like)
ADD COMMENT

Login before adding your answer.

Traffic: 1658 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6