Question

New Work Station for the Lab

0

Entering edit mode

3.0 years ago

Giulia.cosenza ▴ 110

Hi! I'n my lab we are interested in buying a new work station and we are looking for suggestions. We work with NGS data and human genomes so we need a fast machine able to manage loads of big files.

Any suggestion?

workstation pc computer • 1.5k views

ADD COMMENT • link updated 9 months ago by GenoMax 150k • written 3.0 years ago by Giulia.cosenza ▴ 110

2

Entering edit mode

What is "NGS data" and "human genomes", does it mean WGS? What kind of workflows are you running, what what is your budget? For example my work almost exclusively uses single-threaded operations so I bought a workstation with few but fast CPUs, everything else gets outsourced to a HPC. Other analysis types might benefit from massive parallelization, so many CPUs with lower per-core clock speed. Is your analysis memory-intensive, and I/O-intensive? Do you have HPC-like resources available and the workstation is for the analysis that come after the preprocessing (alignment etc)?

ADD REPLY • link 3.0 years ago by ATpoint 87k

0

Entering edit mode

thank you for your answer!

I apologize for not being very clear. I've been asked to do some research but I'm not very computer savvy. The analyzes we do more often are actually analyzes that come after the preprocessing (alignment etc). We need a machine that allows us to work with human whole-genome sequencing (WGS) data. We usually have loads of big files such as FastQ, Bam, gVCF. Also there are often different people working at the same time.

ADD REPLY • link 3.0 years ago by Giulia.cosenza ▴ 110

1

Entering edit mode

And is this patient data? And how much is 'loads of big files'? Like in 1 or 100 Illumina runs per week? And do you have an institutional IT department?

ADD REPLY • link 3.0 years ago by Michael 55k

score 2 · Answer 1 · 2022-04-13

2

Entering edit mode

3.0 years ago

GenoMax 150k

Suggestions in this past thread are still valid: Bioinformatics Workstation Suggestions If you already are doing NGS analysis then you have an idea of what your workflow needs in terms of hardware. Ultimately much will depend on your budget and if you need to buy a support contract. Working with local IT support is generally beneficial since you will need to adhere to local security guidelines/practices.

ADD COMMENT • link 3.0 years ago by GenoMax 150k

0

Entering edit mode

Hello, could you please update those suggestions for 2024? Our lab works in genomics (mostly panels but sometimes WGS & WES as well), we also work on RNA-Seq and metagenomics, all human. The volume can go up to 100 samples per month. Not sure what info you might need to help me.

ADD REPLY • link 9 months ago by menyawino • 0

0

Entering edit mode

What is your budget? Are these clinical or research samples? Do you have support for centralized infrastructure at your university? Though it is tempting to have "control" over the hardware, leaving the administration/management to IT pros will leave you/your lab time to do actual science.

ADD REPLY • link 9 months ago by GenoMax 150k

0

Entering edit mode

They did not declare a budget (apparently they don't have an estimate) so I want to go for the optimum setup for the tasks I have mentioned and from there go higher/lower based on their response to the budget. The samples we analyze are both clinical and research ones (it is a hospital research center if that would make it easier). The thing here is that we already have an HPC with an IT support team but it is still maturing so not so ready and many mistakes happen in the middle of processes causing analysis disruption and data loss.

ADD REPLY • link 9 months ago by menyawino • 0

0

Entering edit mode

You can go a couple of different ways. If you have prior experience (and are willing to deal with multiple vendors in case of failures) then you could buy individual components and put a workstation together. OR you could get a pre-configured supported workstation from a vendor that your IT may have a contract with so you can deal with one entity should there be issues.

Without a budget to go by I would say follow suggestions in Michael's post in broad terms.

One or two multi core (AMD EPYC should be cheaper) CPU + appropriate MB + at least 128 GB RAM (more preferable) + Enough storage (SSD's or a mix of SSD/spinning disk) to keep some data on disk + 50% working space at least. If you anticipate working with nanopore data then a supported NVIDIA GPU.

If you are managing this whole thing on IT side then do NOT forget to design a good/adequate data backup solution (tapes are still the cheapest but you could look into cloud if local policies will allow for it).

ADD REPLY • link 9 months ago by GenoMax 150k

score 2 · Answer 2 · 2022-04-13

A few points here. Setting up such a compute environment should not be taken lightly and may have legal implications.

Talk to your IT dept. first.
I would not recommend going for a workstation in the sense of a computer you put in someone's office, but a server.
Then let everyone who needs access log in to the server via ssh if that is allowed.
I recommend 128GB of memory and 8-12 physical CPUs as the absolute minimum.
Such server hardware and attached storage creates heat and noise, better not put it in your office, you wouldn't like it.
Think about storage space and backup, how much do you need, can it be easily expanded, how does this integrate into your IT infrastructure and legal framework?
Get legal advice. Human data is often regarded as sensitive data and specific requirements for secure storage, access, and keeping records may apply. That is different from one jurisdiction to another (possibly a bit laxer in US).
You maybe cannot use public cloud infrastructures unless they are certified. If there is a certified cloud solution or HPC provider already, consider using that instead.
Who is going to administrate the resource and keep the OS up to date, apply security patches, etc.?

My prior assumptions on which this is based:

You are doing WGS re-sequencing of whole human genomes for e.g. de-novo variant identification, rare disease analysis etc. in addition RNA-seq and other genomics
These human data are in fact patient data
You have funding that is adequately matched with your project (meaning if you have to do these tasks there is an allocation for funding buying and deprecation of compute and storage resources that will also allow you to do it)
You have an institutional IT povider