Hi! I'n my lab we are interested in buying a new work station and we are looking for suggestions. We work with NGS data and human genomes so we need a fast machine able to manage loads of big files.
Any suggestion?
Hi! I'n my lab we are interested in buying a new work station and we are looking for suggestions. We work with NGS data and human genomes so we need a fast machine able to manage loads of big files.
Any suggestion?
Suggestions in this past thread are still valid: Bioinformatics Workstation Suggestions If you already are doing NGS analysis then you have an idea of what your workflow needs in terms of hardware. Ultimately much will depend on your budget and if you need to buy a support contract. Working with local IT support is generally beneficial since you will need to adhere to local security guidelines/practices.
Hello, could you please update those suggestions for 2024? Our lab works in genomics (mostly panels but sometimes WGS & WES as well), we also work on RNA-Seq and metagenomics, all human. The volume can go up to 100 samples per month. Not sure what info you might need to help me.
What is your budget? Are these clinical or research samples? Do you have support for centralized infrastructure at your university? Though it is tempting to have "control" over the hardware, leaving the administration/management to IT pros will leave you/your lab time to do actual science.
They did not declare a budget (apparently they don't have an estimate) so I want to go for the optimum setup for the tasks I have mentioned and from there go higher/lower based on their response to the budget. The samples we analyze are both clinical and research ones (it is a hospital research center if that would make it easier). The thing here is that we already have an HPC with an IT support team but it is still maturing so not so ready and many mistakes happen in the middle of processes causing analysis disruption and data loss.
You can go a couple of different ways. If you have prior experience (and are willing to deal with multiple vendors in case of failures) then you could buy individual components and put a workstation together. OR you could get a pre-configured supported workstation from a vendor that your IT may have a contract with so you can deal with one entity should there be issues.
Without a budget to go by I would say follow suggestions in Michael's post in broad terms.
One or two multi core (AMD EPYC should be cheaper) CPU + appropriate MB + at least 128 GB RAM (more preferable) + Enough storage (SSD's or a mix of SSD/spinning disk) to keep some data on disk + 50% working space at least. If you anticipate working with nanopore data then a supported NVIDIA GPU.
If you are managing this whole thing on IT side then do NOT forget to design a good/adequate data backup solution (tapes are still the cheapest but you could look into cloud if local policies will allow for it).
A few points here. Setting up such a compute environment should not be taken lightly and may have legal implications.
My prior assumptions on which this is based:
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
What is "NGS data" and "human genomes", does it mean WGS? What kind of workflows are you running, what what is your budget? For example my work almost exclusively uses single-threaded operations so I bought a workstation with few but fast CPUs, everything else gets outsourced to a HPC. Other analysis types might benefit from massive parallelization, so many CPUs with lower per-core clock speed. Is your analysis memory-intensive, and I/O-intensive? Do you have HPC-like resources available and the workstation is for the analysis that come after the preprocessing (alignment etc)?
thank you for your answer!
I apologize for not being very clear. I've been asked to do some research but I'm not very computer savvy. The analyzes we do more often are actually analyzes that come after the preprocessing (alignment etc). We need a machine that allows us to work with human whole-genome sequencing (WGS) data. We usually have loads of big files such as FastQ, Bam, gVCF. Also there are often different people working at the same time.
And is this patient data? And how much is 'loads of big files'? Like in 1 or 100 Illumina runs per week? And do you have an institutional IT department?