Hello everyone,
I am setting up a new bioinformatics service at a research institute (79 research g), each potentially requiring different types of analyses. I'm currently in the process of selecting a workstation that can handle multiple, high-throughput bioinformatics tasks and biostatistics efficiently and simultaneously.
Here are some key points I’m considering:
- CPU: I’m thinking about a high core-count processor (e.g., AMD Threadripper or Intel Xeon). Is 16-32 cores sufficient, or should I aim higher?
- RAM: Considering 128 GB of RAM, with the option to upgrade to 256 GB if needed. How much RAM would you recommend for handling typical omics analyses from multiple groups at once?
- Storage: I’m planning to use 1-2 TB NVMe SSD for fast access and temporary files during analysis, along with 8-12 TB HDD (possibly in RAID) for long-term data storage. Does this seem like the right balance between speed and capacity, or should I be aiming for more SSD?
- GPU: I’m unsure if investing in a high-end GPU (e.g., NVIDIA RTX 4090) is necessary. Is it worth the cost if we're not heavily focused on machine learning or deep learning applications right now?
- Simultaneous Analysis: We expect to run several analyses at once (RNA-seq, metagenomics, etc.). How should this impact my choice in hardware specs?
If anyone has experience with setting up bioinformatics workstations for a similar scale, I’d really appreciate your advice on specs and whether there's anything crucial I'm missing. Also, would you recommend consulting a hardware specialist to ensure the specs meet our needs?
Thank you so so so much in advance for your insights!
Do I read it right that you are going to set up a single workstation for 79 research groups? That is not going to be a good solution for so many.
Hi! Yes, you read that right. But they want to start really slow, like beginning with just a few analyses for the few IPs that are in the same building as me. For now, they’re only giving me a regular office computer, so... I know it’s not going to be easy. I guess when they see that I can’t manage, they’ll give me more support.
You don't say where you're based but if in the EU, you most likely have access to free academic HPC and/or cloud resources. Anyway, one machine isn't going to be enough for 79 groups unless they only use it very sporadically or they are OK waiting until it's available.
Yes, it's EU-based, that's the thing. I'm trying to get access to the university servers, so I'm looking for a workstation to get started. They don’t have a bioinformatics service yet, and they’re unsure how much it will be used, so I guess until all the IPs get accustomed to it, I won’t have the amount of work that a typical 79-group institute would have.
If you're getting one machine for a group of 79 researchers at an institute, and its a success, then within a couple of years machine #2 will be on the way.
That's why I'd recommend learning Ansible to configure the machine. You develop a template or set of templates which can be applied to any number of machines. Its very useful and a pre-requisite for setting up a cluster. Alternatives are chef and puppet.
Thanks for the suggestion. I'm not familiar with Ansible (i'm still a rookie), but I'll take a look and see if it could be a good option for this situation. I'll keep you posted!
Thank you all so much for your responses. I know I don’t provide much information, but I really don’t have much more to share. These 79 groups have no bioinformatics support, so until now, any analyses they needed have been contracted out to external companies. The idea is to gradually introduce this service, and for now, I will start by talking to about 10-15 researchers to see how I can help them. Right now, we’re trying to get support from a supercomputing group at a university, with the intention of working with their servers. That’s why my question was about a fairly simple workstation that would allow me to get started. However, I also need something that will last me at least a year, because I don’t want to request an $8K workstation that only lasts two months. In other words, since it’s an investment, I want the workstation to be useful for a while. I’m not sure if this is still not enough information, but the truth is that I’m such a newbie that I don’t even know how much I can share in an open forum. Sorry for the confusion, and thank you again!
The workstation will last a long while. Way after its useful life is over, provided there are no hardware failures. It can always be used to connect to other resources as you get access to them off-campus. Generally if something is to fail it will likely do so within the first year. Post that period things should keep working well as long as the hardware is taken well care of.
Main take home (in case it was not apparent in the discussion here) is you can't do analysis that requires resources beyond those available. @i.sudbery has given you good pointers on what is minimally needed to run one sample through various commonly used programs. So find the most intensive analysis you are likely to do on one sample and make sure your hardware meets that minimum requirement (with a healthy overhead capacity) to allow you to do some other smaller jobs while the main analysis is running.
If bioinformatics research support is going to be your primary responsibility then consider joining "bioinfo-core" organization (LINK) or ABRF. You will find resources, meet like minded people and get up to speed rapidly.