Best Hardware Solution For Medium-Size Bioinformatics Lab [15-20 Computers]
5
9
Entering edit mode
13.2 years ago
Leszek 4.2k

I'm involved in establishing a new bioinformatics lab. It will serve primarily for teaching purporses, but I think it is possible to design small grid as well. The idea is to have 15-20 desktop computers, 4-8 cores each, and build grid using half of the cores. The budget is ~30k Euros. Probably, some network file system (NFS) will be needed as well. Can you share yours experience in that matter? Or maybe you have some suggestions of ready solutions? Or is it better to buy cheaper desktops and invest saved money into 2-3 very strong workstations?

What I experienced so far in Bioinformatics labs, can be divided into 2 solutions:

  • all data stored in network file system (NFS) and computer clients loading everything from NFS. Independent cluster uses the same NFS. It has advantage of easy back-ups, and unified system. But the users have limited privileges and have to ask for every piece of software. Unfortunately, in case of troubles with NFS, no one can work at all:/
  • the client computers store data locally, but there is NFS, mainly for cluster and back-up purposes

    I'm awaiting yours comments.

  • hardware • 13k views
    ADD COMMENT
    6
    Entering edit mode

    I think you should buy a dedicated server with RAID and then buy for each lab member a cheap desktop.

    ADD REPLY
    0
    Entering edit mode

    As you say "15-20 desktop computers", I imagine you have a lab of 15 members and you are considering each user has one desktop and shares part of the resource for the cluster. I do not have much experience, but I think you should consider a dedicated server (with RAID) and buy for each lab member a cheap desktop.

    ADD REPLY
    0
    Entering edit mode

    I like lh3's recommendation of a dedicated server with RAID and a cheap desktop; this will lead to some per-desktop management but I'd expect it to be the solution with the least surprises in terms of maintenance effort, etc.

    ADD REPLY
    0
    Entering edit mode

    What is the goal of the desktops? What kind of applications will you be training them on?

    ADD REPLY
    6
    Entering edit mode
    13.2 years ago
    Darked89 4.7k

    Populating 15+ separate desktops with a number of bioinfo packages does not seem right to me. While it is doable, you would need something like cfengine or Chef to automate everything.

    You have many options when it comes to exporting particular directories from NFS server, from everything (remote booting of clients) through /biosoft and /home dirs, or just /biosoft. So some work can be done during NFS failure assuming you got the basic tools in your /home dir.

    Given a choice I would go with dumb and cheap clients (but with enough RAM and swap & /tmp) and few servers with a lot of RAM / cores. Some stuff (i.e. genome/transcriptome assembly) hardly works, if at all, on low RAM machines.

    If you are concerned about NFS failures then you may go with RAID and mirrored NFS servers.

    ADD COMMENT
    7
    Entering edit mode
    13.2 years ago

    I would recommend Jeremy's BASS approach described here:

    http://jermdemo.blogspot.com/2011/06/big-ass-servers-and-myths-of-clusters.html

    in a nutshell: one powerful central server, cheap clients connecting to it.

    ADD COMMENT
    0
    Entering edit mode

    The one challenge with this. Everything breaks all the time and one big server is a single point of failure (it's why I hate vertical scaling. The more expensive the box, the less happy you are when it goes down). Maybe not a problem in this scenario, but something to remember.

    ADD REPLY
    0
    Entering edit mode

    i can't think of anything that breaks more than the head node on our cluster

    ADD REPLY
    0
    Entering edit mode

    Yes, and you should expect it to. The point is if it's a cheaper box, you just fail away, cause it's easier, and more manageable to have a failover scenario. If it's an expensive box, you are less likely to have a failover scenario. And what happens when the expensive box becomes too small (let's say you have 4 TB of data that doesn't quite fit into memory). Buying a second one is prohibitive. Admittedly, a cluster requires better software, but that's a cheaper, and more robust, long term option.

    ADD REPLY
    5
    Entering edit mode
    13.2 years ago
    Gareth Palidwor ★ 1.6k

    Mixing the desktops and cluster doesn't sound like a good idea. I'd expect it to create a lot of admin issues especially in a high use teaching lab.

    I'd recommend minimal systems for desktops, PXE boot from the network, and create a cluster separately. This should be a low admin solution as a reboot of the desktop will load whatever the new OS changes are. If you're using, for example, gridengine, make the desktops clients and people can qlogin or qsh to the cluster for interactive sessions, also have their home dirs and data dirs automounted locally when they login. As you say, if there are NFS issues everything goes down, but that's an issue anyway in a networked environment. A mid-way alternative is something like the Rocks distro which (if I remember correctly, haven't played with it in a while) will update a local OS install on boot.

    In terms of installs, if it's a teaching lab you really don't want local installs of software. What we do on our cluster is have a /data/binaries directory where the shared binaries are installed, that works fine.

    If you want to go very cheap on the cluster hardware, a friend who has assembled many machines has a successful and simple heuristic for choosing good cheap hardware; go to to Tom's Hardware (or other site), for each component you need, choose the cheapest of the top 5 or 10. This gives a good price / performance ratio. Given the budget your cluster will not benefit much from rack mounting as there will be only a few compute nodes, and you pay a premium for rack hardware vs beige boxes.

    Note that cheap hardware can be noisy, particularly under high load as the fans really spin up; this may be a consideration if the cluster nodes are being housed in the same room as the desktops.

    You'll probably want a local switch for the cluster and desktops given the number of nodes, 32 ports at least (desktops and cluster interconnects). You may want to talk to your institute IT about the interconnects between your network and theirs.

    NAS are cheap for storage, and a system like OpenFiler with commodity hardware can get you 24-48T of RAIDable storage at a reasonable cost. If you just buy a commodity box and configure it yourself, setup and maintenance may be expensive time-wise depending on your level of knowledge. I've had good experiences with the more expensive pre-configured appliances (ReadyNAS, and others) and though the performance isn't great, they're very reliable and almost zero maintenance.

    I would strongly recommend having some sort of backup mechanism; RAID is not backup. Either tape or another NAS holding backups for key stuff.

    ADD COMMENT
    3
    Entering edit mode

    +1 for "RAID is not a backup". Physical separation and copies are a must for important data.

    ADD REPLY
    1
    Entering edit mode

    +1000000 for "RAID is not backup". The only reliable form of backup is physical replication and storage across physical facilities (something like tape works but ideally not in the same facility)

    ADD REPLY
    5
    Entering edit mode
    2.2 years ago

    This is the first hit when I google for "bioinformatics server hardware", so I figured it needs an answer for 2022. The big-ass-server approach with cheap clients for ssh is still recommended. Though today students/staff have their own laptops and access to free tools like VSCode with Remote SSH and Jupyter Notebooks.

    With a budget of EUR 30k adjusted for inflation, the best buy today would be an Illumina Dragen server (USD 25k) plus the software license (USD 10k). These servers have a large NVMe SSD and beefy dual-socket Intel Gold processors. But most important, they have the FPGA chip that makes their pipelines run incredibly faster - which means processing 10 samples one after another has a reasonable TAT compared to parallel execution per-sample on a 10-node CPU cluster.

    Also a good option with lower upfront investment are the cloud-based licenses, where you can rent compatible FPGA VMs from Azure or AWS, and then pay Illumina only for the software license. This also means you don't have to worry about server administration or disaster recovery. For example, Azure Cyclecloud lets you set up a Slurm cluster with a scheduler node that is always running and students can ssh into. The compute nodes can be FPGA VMs that are only allocated with an srun or sbatch command. Each student can get their own node since only one Dragen FPGA pipeline can run at a time.

    ADD COMMENT
    3
    Entering edit mode

    Thanks for following up on an old thread Cryiac. We have recently purchased a new server for the bioinformatics consulting center.

    After a cost-benefit analysis, we purchased a Linux server from Penguin Computing, 300TB storage, 48CPUs, 1 TB of RAM with three years on site service. The total cost of ownership was around $30K

    The same capacity via a cloud computing environment would have run at over $8-10K per month! We already had a fully owned system run for 5 years prior, also purchased from Penguin Computing. All things considered, our institution's total savings in computing will be over a million dollars.

    I will also say that stating that cloud computing frees one from system administration is now quite correct. Cloud computing frees one from hosting and network setup costs. But everything else still needs system administration skills, moreso than with self-owned equipment. AWS administration skills are needed, and those that are more difficult to come by. A wrong setting in AWS can cost thousands of dollars per month.

    ADD REPLY
    1
    Entering edit mode

    All good points about cloud infrastructure. But don't forget the money you save by not needing a sysadmin ($120k p/a) and not needing to upgrade hardware every few years ($$ and disruption).

    I always advocate a hybrid model, where you have just enough on-prem hardware for sustained workloads. For burst workloads, you can spin up cloud VMs and spin them down when you're done. Large organizations will hire cloud expertise into their IT departments, and each lab/division submits tickets to them. For example, a colleague put in an IT ticket for HPC compute/storage and a few weeks later, they have a 100-node Slurm cluster they can ssh into. Only the scheduler node is kept running, while the compute nodes are spun up when you submit a job. Under the hood, IT used Azure CycleCloud, but my colleague did not need to worry about those details. They just submit tickets to IT when there are issues.

    IMO idle nodes in an on-prem HPC cluster feels like wasted investment. We should just run something on it. Our employers don't charge us for electricity. :)

    ADD REPLY
    0
    Entering edit mode

    I had not considered most smaller labs would use Dragen but it might make sense if they are buying that hardware anyway. Bear in mind the AWS Dragen AMI with recommended specs is a whopping $18.40/hr (so likely more than the grad students using them are paid), but I suppose that is mostly a convenience fee for the AMI?

    ADD REPLY
    1
    Entering edit mode
    13.2 years ago
    Fabian Bull ★ 1.3k

    I have no experience with setting up IT-infrastructure but I can descibe the infrastructure I am working with at a institute.

    We have seperate high-performance server and a cluster all mounted with a huge nfs network storage. There are backups on a hourly basis. So everybody can share files and backup his stuff by himself.

    Its a great working experience.

    ADD COMMENT

    Login before adding your answer.

    Traffic: 2461 users visited in the last hour
    Help About
    FAQ
    Access RSS
    API
    Stats

    Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

    Powered by the version 2.3.6