Let's consider an abstract simulation, a simulation consists of a state-space S (a gigantic matrix, |S|>>big number ) and a parallel algorithm A operating on S using m CPUs, e.g. each taking on one cell of S. Further, we assume that A is discrete/stepwise (per time interval of which there are t >> big number) and synchronized, meaning that it consists of discrete steps updating S and each of the m instances running A needs to have an up-to-date version of S to start its computation. Also, A leads to near 100% of S being updated in each step.
If nodes do not have shared memory, there is a large transfer penalty in the size of |S| per node over the network; even if every node only sends updated portions of S over the network, they still need to receive the new state-space in each step. As the simulation consists of a large number of steps, the total transfer volume is proportional to: |S| * m * t.
If one wants to improve this,
each node could keep track on which other nodes need their updates and only send to those, the factor m could be reduced, however that is more complicated to implement and imposes restrictions on A.
In a shared memory environment, this penalty is virtually 0.
This gives some ideas about when it might be ok to use a cluster:
- There is no state-space or S is very small (blast 1e10 fasta sequences, embarrassingly parallel)
- S is guaranteed to be sparse (mostly 0)
- The jobs do not need to be synchronized so often or not at all (only every n seconds)(real-time strategy games) or S can be interpolated if missed
- Each step, only very few changes made to S
- Interactions are local (each cell needs only knowledge of direct neighbors, e.g. 'game of life' like)
which applications are you talking about exactly?
I think in this case this is not so important, it might apply to many different types of simulations, also molecular dynamics or climate models.
simulations are usually computationally intensive, because you want to simulate as many factors as possible. For example, simulating the docking of a drug on a protein is very intensive, because you need to take into account a lot of factors (the position of each atom, its charge, etc..). Can you specify better which simulations are you referring to?
If it wouldn't be shared memory then process will be very slow as hundreds of the process are running for a job and every time accessing it from cluster or stored memory is not a good idea !! It will eat lot of system resource for accessing the cluster. while shared memory would be off course faster.