Forum:<3000$ in funding for computer equipment. Should I save this for cloud computing costs or invest in computational infrastructure?
3
1
Entering edit mode
6.4 years ago
Arnold Kwok ▴ 10

I am relatively new to the field of bioinformatics. My lab has recently shifted work towards bacterial genome assemblies, amplicon analysis, and metagenomics using MiSeq data. There is currently no computational infrastructure in my lab and I was wondering if the money would be best spent to pay for a cloud compute cluster on Google or Amazon EC2 or if some investment into infrastructure should be made. We currently have no workstations.

Secondly, if we do invest into infrastructure, I was wondering about the importance of having server grade hardware (with ECC memory) vs workstations.

My only understanding is that the datasets I will be working with would be considered small in the bioinformatics community but wanted the best option/value for what I have access to.

hardware • 2.3k views
ADD COMMENT
1
Entering edit mode

@Arnold: You should read @Eric Lim's comments about cloud computing in this thread and consider them (C: What server do you use? ).

$3K is not a lot of money for compute infrastructure. You are at best looking at a decent workstation (have you looked at prices of DDR4 RAM lately and ECC is going to add money on top).

ADD REPLY
0
Entering edit mode

I agree. With that money, you could invest in a powerful workstation. Think long term about the disk space. I have a personal laptop with 16GB RAM, 2TB hard-disk, and 4-core CPU. Cost ~600 British pounds (~$1000). You could purchase a much more powerful workstation with larger disk for a bit more than this. You could consider using linux (Ubuntu Server) as the OS on this workstation. Any local IT professionals could provide ad-hoc help in the future.

ADD REPLY
0
Entering edit mode

What is the context you are working in? There are some possibilities for (university) researchers to get access to government-funded cloud computing. One example would be ELIXIRs cloud services.

ADD REPLY
5
Entering edit mode
6.4 years ago
Eric Lim ★ 2.2k

Just want to provide some data points for what we pay AWS. Stoke is a small biotech and our entire compute infrastructure is cloud-centric. Our workflow is mostly snakemake in AWS Batch using spot instances. The figure consists of various genomic experiments we processed in May, totaled over 300 poly-A RNA sequencing data from internal and public sources. Prices are shown in USD for STAR's first and second pass. Numbers generally correlate well with sequencing depths, and for the majority of the data we have, we're paying less than $0.25/sample + potential 10-20% for storage needed during computation. I should add that these are approximated numbers and should be treated as such.

Many tend to think processing samples in the cloud is extremely expensive and that it comes with hidden costs. After 6 years of using the cloud, my experience has been largely positive. After crunching these data, we paid $11.35 for data transfer, $3.38 for CloudWatch to monitor services, and $0.91 for ECR registry to host docker images. If you were to permanently store genomic data in the cloud, there'll be an additional, but often predictable cost.

Switching to cloud certainly comes with a lot of technical challenges, but IMHO, the price shouldn't be at the very top of that list.

Hope this is helpful.

enter image description here

ADD COMMENT
3
Entering edit mode
6.4 years ago

If your institution has a high performance computing cluster, then I would recommend to look into getting access to it or to one at a nearby institution. Regardless, you will always need some compute capacity in the lab so it's probably best to start with this. Cloud computing is good as a temporary solution or for infrequent use or for computing on large data sets already hosted in the cloud but can become expensive once the need becomes permanent. Also as errors can be costly, only run well debugged pipelines in the cloud. Using the cloud doesn't actually completely remove your sysadmin needs as you will still need to provision and set up your machines in the cloud. You just don't have hardware maintenance to worry about. In the end, what makes more sense depends on your usage pattern and with the opaque jungle that cloud pricing currently is, it's not easy to make cost projections. Concerning ECC, it's recommended for large amounts of RAM that are continuously in operation as the probability of errors increases with the amount of RAM used. Most server grade hardware will anyway come with ECC RAM by default.

ADD COMMENT
2
Entering edit mode
6.4 years ago
Joe 21k

Something to consider will be who your ‘sys admin’ is going to be. Buying local infrastructure comes with a lot more caretaking. That said, it gives you a lot more freedom too I think. We are currently in the situtation that our sys admins have been a PI who has now moved, and myself as a PhD student, and I am due to finish fairly soon, so the servers we have are a little in jeopardy.

I think you would be fine with some workstations FWIW. You can get them kitted out with Xeons for a decent thread count and ECC memory within that price point I would expect.

ADD COMMENT
0
Entering edit mode

Could you explain the difference between Xeons and i7s of similar clockspeeds? I've been racking my head for a while on this because I have found some very cheap core i7 workstations on the market.

ADD REPLY
2
Entering edit mode

See this.

ADD REPLY

Login before adding your answer.

Traffic: 1850 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6