Yes indeed, the kilokluster is not used anymore. the new clusters at UCSC are called swarm, encodek and memk. Swarm is the only interesting cluster here, with something around 1000 CPUs on ~ 250 nodes, built in part by clusterguys
The key part of all clusters here is a 250 TB IBM GPFS filesystem. GPFS costs money but it sits in the kernel and has been designed from the start for distributed operation. I don't know how it works and where the caching servers are located, I can only say that it is extremely fast. Way faster in distributed mode than ZFS, and even the metadata information arrives on time. The network links are normal 1 GB ethernet from what I know. Big clusters cannot use NFS at all, to my knowledge.
The other important part are local harddisks, 250 GB each, they should be bigger, but that's what we have. Genomes and all big datafiles are stored on local harddisks and are loaded from there, that avoids the clogging the network.
We do not use LSF or Sun/Oracle grid engine but parasol, a light-weight job scheduler written by Jim Kent. It can handle millions of jobs and submits a job within milliseconds, and can recover from some failures automatically. Some parts of the group have plugged more complex layers on top of that, notably JobTree by Ben Paten but that's only if you have complex workflows. JobTree can restart the right jobs after any errors.
The use of local harddisks and a distributed filesystem is in the end very similar to what Amazon, Yahoo and Google are doing. They have distributed file systems (like Google FS, HDFS) + use local storage (HDFS) + a fast scheduler, just we do not have to rewrite our software to use the local storage in a certain way (like MapReduce or Hadoop), we just input and output files. But this is also because, in genomics, our datafiles are very very small compared to what big internet companies have and jobs are often computation-intense and not as IO-intense.
I know a bit about the Sanger infrastructure but I better let that to someone from the EBI to explain more in detail.
You probably know Chris already from the blogosphere, he has lots on info on his blog about clusters for genomics http://blog.bioteam.net/
At the moment, if you plan to build your own cluster, I would go for one of those massive 48-CPU machines and run parasol on it. That's in my opinion the simplest way to get a small cluster running, but I have never tried it (has anyone here?)
many thanks Maximilianh ! :-)
Excellent update on the UCSC systems, thanks for this Max. Info on the Sanger system would probably be appreciated by all as well.
maximilianh - although I agree huge SMPs are nice, why'd you want to run a batch system on a single machine? IME, batch queuing is better avoided if possible.
if your software is not multithreaded, then batch queue is the only alternative, right?
By the way, GPFS is using multiple fileservers, Here we use around 30 fileservers for 250 compute nodes
NFS should not really be a problem even for large clusters, now that there is pNFS (parallel NFS). Also, some vendors have proprietary NFS like solutions that give you direct access to the storage blades/nodes (after talking to a directory blade), which should make it possible to scale up quite well (we run one such at UPPMAX). Might depend on what is a large cluster, but at least a ~340 node cluster (2xquad core in each node) against a 600TB such parallel storage system, is not really any problem at all, from our experience.