Forum:Ideal PC configurations and operating system for bioinformatics laboratory
2
1
Entering edit mode
7 months ago
Estevão ▴ 10

I am new to the field of bioinformatics and would like to buy a computer for my lab that allows the analysis of several single cell samples, probably from fastq files. What specifications and operating system would be ideal for my case?

PC • 1.8k views
ADD COMMENT
3
Entering edit mode

What specifications and operating system would be ideal for my case?

For choice of OS: Linux to access the large pool of open source software. One may be able to get away with running WSL2 with Win 11 (if you must have windows).

Hardware choice is going to be more tricky. Get the most amount of RAM you can, especially if you are going to be doing Single cell.

It may also help people if you could put a $ figure (or whichever currency you plan to use) in your post as a guide for your budget.

ADD REPLY
0
Entering edit mode

Thank you. My budget is around 20 thousand dollars.

ADD REPLY
2
Entering edit mode

several single cell samples

At the same time? You need a server, not a PC. Google around (and search the site as well with keyword "configuration") but you're going to need a few TB of storage, ideally >= 128 GB of RAM and whatever processor is the best these days. Single cell runs (at least cellranger) are memory and time intensive and the FASTQ files + results can get bulky really quick.

ADD REPLY
0
Entering edit mode

You'll probably get away with using one of the other analysis pipelines, like Alevin-fry with much less than 128GB of RAM. Workstations can pretty much get as beefy as most servers these days, although they arn't any cheaper really.

ADD REPLY
2
Entering edit mode
6 months ago
  • As others have said, 1-2 workstations are probably the way to go.
  • A Linux OS like Ubuntu is probably easiest and has best docs
  • RAM is the real show stopper, get as much as possible. Min 256GB would be my recommendation - R /Seurat or scanpy both need big RAM
  • Look at various models and configurations online. DELL is very flexible and nice to get an idea of current prices and lots of different options.
  • You may be able to get Linux preinstalled, but it is easy enough to do yourself
  • I would not bother with a cluster with just 1-2 machines.
  • Get SSD locally 4-8 TB on the machine, and external hard disks and/or AWS glacier for backup of old datasets
  • AMD ryzen or EPYC processors allow a lot of cores for their price and are excellent for bioinformatics
ADD COMMENT
1
Entering edit mode
6 months ago
emma.a ▴ 130

20k are good money for some workstations ... probably you need to run more samples at the same time ... for some 1000s you could take a workstation with 8 slots for disks (you can start with 1 HDD 4-8Tb and 2 SSD 2Tb each), 256G RAM, 20-cores (40 threads), network card 10G ... in the future in the same workstation you can add more disks and RAM, if you need. Just check to also have space to add in the future one GPU - without buy a new workstation. Also you can nstall a Job Scheduler (SGE, SLURM, etc.) or just use GNU parallel. If you take 2 or more workstations you can also create a Linux OS cluster. Long term Cloud usage is expensive (CPU/RAM) - with the same money you can buy some workstations to use for 3-5 years. In general Cloud storage is very cheap. Linux OS is the best option as it can give you access to all/most open-source tools and it is very flexible and you can customize it; I should totally avoid Windows - if you also need windows just create a Windows OS Virtual Machine in your Linux OS workstation.

ADD COMMENT
0
Entering edit mode

Just for comparison, I deal with scRNA-seq daily and my workstation is a 12700k i7 with 128GB RAM and even this is actually more than I need. SSD are a must these days. 10G connection only if your ethernet actually supports this. Such workstation without GPU (because who needs GPUs unless you do long reads and fancy AI -- I did not have a GPU in almost 10 years now) costs < 2000 euro. Twenty k is a large budget. This all assumes that the initial processing is done on a HPC (which actually many universities and institutes provide, either directly or via cloud resources) and I only do the downstream locally. That having said, I have yet to meet the "normal" lab that does daily runs of CellRanger. You do an experiment, process and then analyze for weeks of months. OP does not sound like a core facility service provider.

I should totally avoid Windows

Depends. If you do office and presentation stuff and might need things like Teams or other relatively OS-restricted stuff just use Windows and WSL2. I do exactly that. Not that I like Windows, but seriously, I even like Linux GUI less for daily office things. macOS is the best OS overall, but hell their pricing is so off-the-scale these days that I refuse to flush any money down their drains.

I totally recommend AGAINST buying sysadmin-heavy hardware. You're not paid to do sysadmining but to analyze interesting data and find novel biology.

In terms of storage, we operate a local Synology NAS for storage of most things. Not too expensive, very robust. AWS and all that is expensive over time. Sysadmining a NAS is not hard. Local harddrives for double-backups of important data are always a good idea. Many institutes have also some sort of storage solutions you might want to inquery about.

I strongly recommend against any local storage of data directly on the workstations. Have it generic by putting on a NAS or similar network storage. That also allows easy switch of workstations in case they're occupied or broken. Data to NAS, code to GitHub, software on DockerHub (or similar). No local storage these days. These times are over (imo).

Also you can nstall a Job Scheduler (SGE, SLURM, etc.) or just use GNU parallel.

Schedulers take sysadmin work. Just let things run and put a post-it "don't dare to touch". In a reasonably well-working environment people might even obey. Schedulers are for servers. As said, it is unlikely that intensive long-running jobs will run on these machines on a daily basis based on what OP states above (my interpretation).

It is always a good idea to visit collaborators or lab who do pretty much the same as you, talk to people and check setup that they use. Being new to something and then be in charge of important decisions like hardware for 20k is not a good combination.

ADD REPLY
0
Entering edit mode

It very much depends on whether the OP does want/need to do the CellRanger runs locally or not. Doesn't really matter if its only occasionally if that key step can't be performed.

Then again, there are now pipelines that require less resources than CellRanger.

ADD REPLY
0
Entering edit mode

personaly I see the workstations in long-term ... maybe today you are running only scRNA but having a good workstation tomorrow you can also start running bulk-RNAseq without to go to buy a new workstation ... just for example.

ADD REPLY
0
Entering edit mode

Bulk RNA-seq is way less intensive in every regard than single-cell. Whatever you buy, be sure it's upgradable, meaning, no Mac :)

ADD REPLY
0
Entering edit mode

Yeah, a standard 3x3 bulk experiement is very analysable on pretty much most good consumer laptops these days.

ADD REPLY
0
Entering edit mode

Once you have the counts any n is possible on a regular analysis-grade laptop or workstation. Even hundreds of samples. It's really the preprocessing that can be intense.

ADD REPLY
0
Entering edit mode

DE with 100s of samples (particulalry if your experiemental design is more complex than 2 condition DE) can stretch consumer grade hardware in my experience, but yes, depending on what you mean by "analysis-grade", its definately possible on a not paritcularly fancy workstation.

ADD REPLY

Login before adding your answer.

Traffic: 1978 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6