Forum:Whole-Exome Analysis on a Mainstream Desktop PC
4
3
Entering edit mode
4.5 years ago

Whole Exome analysis on a 1500 € standard PC, is that possible?

We are regularly asked what kind of computer is needed to analyze Whole Exome data. We have tested it for you!

enter image description here

Analysing NGS data usually is the domain of power compute servers or High-Performance Clusters, as it typically has massive volume and requires a lot of data crunching. However, the power of available CPUs is increasing, and now there are faster processors with more cores available on the market at competitive prices. This applies not only to the sector of the server “workhorses” but also to the sector of “standard” Desktop PCs. This sector is characterized by commodity hardware that is produced in high volumes and can be bought at affordable prices. While this kind of PC used to have just a single core, it is today possible to buy Desktop CPUs with as many as 16 cores.

Given the price premium that is necessary to enter the high-performance server sector, the question arises: why not use commodity hardware for NGS data analysis? So we started an experiment: how feasible is it to do a resource-intensive whole-exome sequencing (WES) analysis on a modern Desktop PC.

read more

NGS whole-exome • 2.3k views
ADD COMMENT
3
Entering edit mode

it is possible to easily analyse two NextSeq 500 runs comprising 24 Whole Exomes in a single day on a ‘normal’ Desktop PC

If you are running 24 exomes per day, you can probably afford more than one desktop computer.

ADD REPLY
1
Entering edit mode

Not every laboratory starts with 24/7 whole exome analyses. Many want to start analyzing this data and they do not want to buy a HPC for that directly.

ADD REPLY
1
Entering edit mode

Do people buy an HPC to process a few exomes?

ADD REPLY
1
Entering edit mode

We work with laboratories of various sizes and many think that they need an HPC even for a small number of analyses. Therefore we want to show that this is not necessary. However, if a laboratory wants to analyse several exomes per day, then a standard computer may not be sufficient.

ADD REPLY
1
Entering edit mode

I'm pretty sure that a box built around some nice desktop CPU like the Ryzen Threadripper 3990X would do the job just fine ;)

Edit. I bet your results were mostly due to IO (the SSD)

ADD REPLY
2
Entering edit mode

I was running exomes on a desktop PC in 2012 and it was fine.

ADD REPLY
5
Entering edit mode
4.5 years ago

This post is more an advertisement than the benchmark. Analysis involved commercial software Seamless NGS software marketed by the OP/ OP employer, which is a conflict of interest, if it is not mentioned. Cost of software is not included in the analysis. Title should have been "Whole-Exome Analysis on a Mainstream Desktop PC with Seamless NGS software". But then it would have been advertisement for the company. This would have benefited the community if it were run with open source pipeline or software available to wider community, on a desktop computer (Eg. free to use and community standard tools such as bcftools, BWA etc). For that matter, it is not even a 'forum' category post.

ADD COMMENT
1
Entering edit mode

For that matter, it is not even a 'forum' category post.

Technically, this was more of a comment than an answer.

ADD REPLY
2
Entering edit mode
4.5 years ago
JC 13k

In some projects I set up some desktop machines for analysis RNA-seq, WES and bacterial/virus assembly, the gaming machines are good for this. No surprises here.

ADD COMMENT
1
Entering edit mode
4.5 years ago

I agree that, depending on volume of data and turnaround time, doing NGS analysis on a desktop is possible these days. However, I find your comparison with HPC a bit misleading as I'm not sure that a single server with 40 CPUs can be considered a "high performance cluster" (by the way, what does C stand for in HPC? Cluster, computing, computer...?). On an HPC with 16 nodes and 16 CPUs/node (still a small HPC) I would expect a ~10x speedup over the desktop machine.

if you are running all the 16 exomes in parallel, I think you are penalizing the 40 CPU machine too much on the I/O and you are not making good use of the CPUs. Maybe running a couple of exomes at a time with 20 threads to bwa and to other multithreaded jobs will speed things up quite a bit...?

I'm curious about the "annotation" step as it seems to take longer than the variant calling itself. From what I remember, ensembl VEP should take a fraction of the time of the variant calling.

ADD COMMENT
1
Entering edit mode

(1) Maybe the abbreviation HPC is a bit ambiguous. We are considering HPC to be a high-performance computer and not a cluster. These systems often are equipped with expensive server hardware and come as rack systems which then also requires you to have a rack with wiring, cooling, and monitoring in place often placed inside a specific server room. However, the turnside is that the CPUs of the older HPCs are slower and, most importantly for NGS, the storage was not yet based on SSDs. It just illustrates that you can at the moment get systems that could even outperform older much more costly ones and, yes, this is a common pattern with computers following Moore's law but we want to highlight specifically what this means for Whole-Exome Analysis right now. For sure, you could also upgrade the older machine or get a new HPC that could greatly outperform the 1500-Euro computer but that was not the point we wanted to make.

(2) The Exomes were not run all at the same time but we used parallelization in some cases just on the per-sample level and in other cases, e.g., mapping, on per-sample (inner) and between-sample (outer) level. For sure this could be optimized for efficiency given I/O is definitely a bottleneck, especially for the HPC, but this will also not be done by the common user. If one has the knowledge and resources to optimize the pipeline for the current system or even on a cluster, this is again not the common user but at least an experienced one and these were not the target audience for this blog entry.

(3) Annotation is just a word and does not specify what has been done in detail. Here, we do annotation with a number of databases and prediction scores (some of which use pre-calculated scores and other like some splicing predictions perform the calculation on the data itself) which accumulate to take up a considerable amount of time. For sure, one could reduce it to use only VP with default and end up with only a faction of this time. The question is whether this is sufficient for the proficient assessment of the variants later on. The time is probably better spend by the analysis pipeline than by a clinician that then requires more time for the evaluation or needs to manually search in other databases.

ADD REPLY
1
Entering edit mode
4.5 years ago
Lars ★ 1.1k

Nice article, but I was also aware that you can analyze whole exomes on a normal desktop PC.

ADD COMMENT

Login before adding your answer.

Traffic: 2279 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6