I am new to the field of bioinformatics and would like to buy a computer for my lab that allows the analysis of several single cell samples, probably from fastq files. What specifications and operating system would be ideal for my case?
I am new to the field of bioinformatics and would like to buy a computer for my lab that allows the analysis of several single cell samples, probably from fastq files. What specifications and operating system would be ideal for my case?
20k are good money for some workstations ... probably you need to run more samples at the same time ... for some 1000s you could take a workstation with 8 slots for disks (you can start with 1 HDD 4-8Tb and 2 SSD 2Tb each), 256G RAM, 20-cores (40 threads), network card 10G ... in the future in the same workstation you can add more disks and RAM, if you need. Just check to also have space to add in the future one GPU - without buy a new workstation. Also you can nstall a Job Scheduler (SGE, SLURM, etc.) or just use GNU parallel. If you take 2 or more workstations you can also create a Linux OS cluster. Long term Cloud usage is expensive (CPU/RAM) - with the same money you can buy some workstations to use for 3-5 years. In general Cloud storage is very cheap. Linux OS is the best option as it can give you access to all/most open-source tools and it is very flexible and you can customize it; I should totally avoid Windows - if you also need windows just create a Windows OS Virtual Machine in your Linux OS workstation.
Just for comparison, I deal with scRNA-seq daily and my workstation is a 12700k i7 with 128GB RAM and even this is actually more than I need. SSD are a must these days. 10G connection only if your ethernet actually supports this. Such workstation without GPU (because who needs GPUs unless you do long reads and fancy AI -- I did not have a GPU in almost 10 years now) costs < 2000 euro. Twenty k is a large budget. This all assumes that the initial processing is done on a HPC (which actually many universities and institutes provide, either directly or via cloud resources) and I only do the downstream locally. That having said, I have yet to meet the "normal" lab that does daily runs of CellRanger. You do an experiment, process and then analyze for weeks of months. OP does not sound like a core facility service provider.
I should totally avoid Windows
Depends. If you do office and presentation stuff and might need things like Teams or other relatively OS-restricted stuff just use Windows and WSL2. I do exactly that. Not that I like Windows, but seriously, I even like Linux GUI less for daily office things. macOS is the best OS overall, but hell their pricing is so off-the-scale these days that I refuse to flush any money down their drains.
I totally recommend AGAINST buying sysadmin-heavy hardware. You're not paid to do sysadmining but to analyze interesting data and find novel biology.
In terms of storage, we operate a local Synology NAS for storage of most things. Not too expensive, very robust. AWS and all that is expensive over time. Sysadmining a NAS is not hard. Local harddrives for double-backups of important data are always a good idea. Many institutes have also some sort of storage solutions you might want to inquery about.
I strongly recommend against any local storage of data directly on the workstations. Have it generic by putting on a NAS or similar network storage. That also allows easy switch of workstations in case they're occupied or broken. Data to NAS, code to GitHub, software on DockerHub (or similar). No local storage these days. These times are over (imo).
Also you can nstall a Job Scheduler (SGE, SLURM, etc.) or just use GNU parallel.
Schedulers take sysadmin work. Just let things run and put a post-it "don't dare to touch". In a reasonably well-working environment people might even obey. Schedulers are for servers. As said, it is unlikely that intensive long-running jobs will run on these machines on a daily basis based on what OP states above (my interpretation).
It is always a good idea to visit collaborators or lab who do pretty much the same as you, talk to people and check setup that they use. Being new to something and then be in charge of important decisions like hardware for 20k is not a good combination.
DE with 100s of samples (particulalry if your experiemental design is more complex than 2 condition DE) can stretch consumer grade hardware in my experience, but yes, depending on what you mean by "analysis-grade", its definately possible on a not paritcularly fancy workstation.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
For choice of OS: Linux to access the large pool of open source software. One may be able to get away with running WSL2 with Win 11 (if you must have windows).
Hardware choice is going to be more tricky. Get the most amount of RAM you can, especially if you are going to be doing Single cell.
It may also help people if you could put a $ figure (or whichever currency you plan to use) in your post as a guide for your budget.
Thank you. My budget is around 20 thousand dollars.
At the same time? You need a server, not a PC. Google around (and search the site as well with keyword "configuration") but you're going to need a few TB of storage, ideally >= 128 GB of RAM and whatever processor is the best these days. Single cell runs (at least cellranger) are memory and time intensive and the FASTQ files + results can get bulky really quick.
You'll probably get away with using one of the other analysis pipelines, like Alevin-fry with much less than 128GB of RAM. Workstations can pretty much get as beefy as most servers these days, although they arn't any cheaper really.