I am looking at building a computing cluster for my lab. We were originally going to go with amazon EC2 connected to SLURM, but due to some issues we have now decided to go with hardware. We have been looking at a few used dell blades to run and for students to learn how to use the system. Any information would be appreciated. We will be primarily doing RNA-Seq analysis, differential expression testing and splicing. Most resource intensive tools we will be looking to run are Tophat, STAR, and Trinity.
Look at Rocks for OS. Since you are buying used I would suggest getting the most RAM you can find per blade (e.g. STAR requires 30+G for human genome). You can consolidate the RAM from blades and keep some of the spare blades (now without RAM) as backup. This would be important since you will have no warranty and can use the spare blades as replacements (in case of hardware failures) in future. You may want to buy a new switch (not strictly necessary but a faster switch will provide better bandwidth, I assume you will use ethernet) to hold everything together. Switch may not be needed in case you only have one blade enclosure. Invest in a new disk storage array, JBOD or whatever fits in your budget (unless you can get storage that will come from some central compute facility). If you are going to administer this yourself then please find someone who can help with initial install/hardening of the system in terms of security (especially if the cluster will reside on open internet).
Thanks. A few things I should have mentioned. This system will be residing offline and only accessible through a locally connected head node. We are looking at 4 nodes, one large one like a dell r910 with 64gb of ram and a few r710s to run web services and handle data transfers. The facility where this will be located is currently using a blade running freeBSD to handle data storage and serving duties so we may look into upgrading that system. Any thoughts on Dell vs lenovo for buying individual blades.
Ah .. you are referring to compute servers as "blades" (and not real blade servers based on the model numbers you mention). We have had good experience with Dell servers over the years but lenovo should be acceptable as well. Look for hardware that is well cared for specially since you are not going to have a maintenance contract warranty (I assume). If you don't want to manage the overhead of looking after a cluster/job scheduler (and only a few users are going to use the servers) the servers could be set up separately as standalone machines and used based on the hardware they have and type of job being run.
That was what I was thinking. Node 1 being a high powered r910 to run trinity and star and then a few smaller blades for all the other toolsets. Possibly 4 r610 servers. Our lab essentially has gone from never doing heavy RNA-Seq to doing consistent mapping and splicing analysis overnight so we are looking for reliability and power.
Thanks. A few things I should have mentioned. This system will be residing offline and only accessible through a locally connected head node. We are looking at 4 nodes, one large one like a dell r910 with 64gb of ram and a few r710s to run web services and handle data transfers. The facility where this will be located is currently using a blade running freeBSD to handle data storage and serving duties so we may look into upgrading that system. Any thoughts on Dell vs lenovo for buying individual blades.
Ah .. you are referring to compute servers as "blades" (and not real blade servers based on the model numbers you mention). We have had good experience with Dell servers over the years but lenovo should be acceptable as well. Look for hardware that is well cared for specially since you are not going to have a maintenance contract warranty (I assume). If you don't want to manage the overhead of looking after a cluster/job scheduler (and only a few users are going to use the servers) the servers could be set up separately as standalone machines and used based on the hardware they have and type of job being run.
That was what I was thinking. Node 1 being a high powered r910 to run trinity and star and then a few smaller blades for all the other toolsets. Possibly 4 r610 servers. Our lab essentially has gone from never doing heavy RNA-Seq to doing consistent mapping and splicing analysis overnight so we are looking for reliability and power.