Including the cost of personnel, lab equipment, consumables, sequencing, and everything that makes your lab able to produce and publish science, what proportions of these costs (in percentage) is related to computing (computers, accounts, storage...) not including personnel?
To get a more meaningful cost number, you will need to define the scope of the cost calculation. Example:
For this PNAS paper that has a large sequencing/computational component, how much of the total cost of research to generate this paper is related to computation?
For a lab that publishes a lot of genomic research (so you can amortize some of the infrastructure costs) how much of the total cost per year is related to computation?
For this sequencing experiment that I want to do, how much of the cost is computation if I have to be able to store/distribute the end result for 3 years?
Often times, the infrastructure costs are shared among many labs in a centralized compute facility so it will depend on how many people are sharing the hardware. If you put data on the cloud, it will depend on how long that data must be available / accessible.
ADD REPLY
• link
updated 23 months ago by
Ram
44k
•
written 9.6 years ago by
Ying W
★
4.3k
This kind of depends on how you classify the salaries of people whose job is computational analysis/pipeline building/etc. But here, I'd estimate 50% for all computational costs including said personnel.
I've never been a PI so I can only guess, but on the labs I worked ranged from probably less than 2% (basically, a server and a CLC license) to something akin to 20-30% (a cluster with 7 nodes, one of then with 512Gb memory, NFS storage server, lots of SAS disks; but also did lots of wet lab work, bought Illumina sequencers, etc) to close to 100% on a "pure" bioinformatics lab.
This is a tricky calculation to do well. For us, though, storage has come out to be much more challenging and expensive than compute, both in terms of personnel and in terms of infrastructure.
Storage is indeed quite a challenge. Including the storage costs, can you do an estimate of your computing costs as a percentage of your lab's money sources?
It is still quite small. For our entire institution, scientific compute costs are still quite small (though enterprise computing like desktops, financial systems and the like) are somewhat larger. That is likely to be true at any institution.
Personnel costs are usually more than 50% and sometimes approaching 70% of an academic institution's budget (this is an observation, not based on any study).
I would consider doing some cost comparisons for your approximate use case. There are two major approaches--do-it-yourself versus commercial cloud. Each has advantages. Be sure to include the personnel costs in there.
I understand (or suspect) that this is a joke but may not be too far from the truth. I did a sort of a back on the envelope estimation and I got about the same number.
Sorry Eric! It was a joke - or at least an answer in the form of a joke - because there is no possible way for me to give an answer for this. If the question simply related to consumables, an accountant could give you a value, but because you said "and everything that makes your lab able to produce and publish science" then the question itself becomes difficult to define. I suspect what you're trying to do, whether you know it or not, is Cost Accounting also known as "Scientific Decision Accounting", which is trying to define the cost of something and weigh it against the value it returns, for the purpose of decision making (rather than just plain old accounting). Typically "well they spend X in Y to achieve Z. We want to achieve Z, so we should spend X too!".
Traditionally, this is really hard to do right and often abused (outside of science) because costs are difficult to define.
Lets say I wanted to buy a new €4000 workstation to do my alignments. The €4000 aside, we have the costs involved in purchasing the computer (getting quotes, etc etc), depreciation for however many years we keep it, the cost of maintenance and repair, the cost of electricity, and the cost of resale/scrapping. If we hope to resell the computer after 2 years for €2000 what is the cost of the machine right now? €4000? €2000? €2000 - interest on €4000 + inflation?
And thats just costs. The other side of the coin was 'everything required to do science' which means I also have to think about value. Could I have done the same amount of science with a smaller computer? Did I really NEED that computer, or did I just buy it to use up the last bit of money before the end of the tax year? Would I have still published paper xyz without that computer? Would I have still received grant xyz for next year if I hadn't bought that computer?
And after doing all of that, and coming to a value of something in the order of <5%, what has that really told us? That in my very specific environment/lab, my computing costs are <5% - what about someone working in China where hardware and human labour is cheaper? Do they have lower costs, or more computers? What about someone working for an institute with a lot of shared institute computer resources?
In short, this would be really really hard to answer outside of science. Inside of science, where value is impossible to define, there's no chance of either generating a meaningful %, or even with that value, translating it into any decision making process on your end :(
ADD REPLY
• link
updated 23 months ago by
Ram
44k
•
written 9.6 years ago by
John
13k
0
Entering edit mode
I know it can become tricky, but I still think you can put a price on it: During the last 5 years, how much actual money did you spend on computing (computers, processors, disks, outsourcing computing and storage, etc.)? What percentage of your lab's financing did that represent?
The cost of computing is very little here I would say. We (i.e. the lab for which I'm bioinfomatician) have bought a server (cost about 3-4000£, it was a bargain I think), each group member has a mac laptop and/or a mac desktop, and we have a bunch of external disk drives (nothing fancy). The heavy duty jobs are done on our institute cluster. Storage and archiving is as well provided by our institute.
To put things a bit in perspective, in the last 3-4 years I went through about 3000 fastq file (~35 billion sequences) in ~1000 libraries, plus a bunch of microarrays (not much compared to sequencing).
This is an excellent question, but unfortunately the best answer I can come up with is 'it depends'.
Even if the overall use cases remain constant (i.e. the type of research you're doing and the scale of the data), it will depend what your existing resources are and if they are sufficient. For example, here at Indiana we have access to excellent University-wide HPC resources (e.g. http://rt.uits.iu.edu/bigred2/), but our group has standalone linux VMs and high memory machines that are housed in our CS building. This gives us added flexibility and speed (no queue times) in our research.
However, once those (capital) costs are allocated, the rest of the spend is relatively small (storage, maintenance, support). So overall my (very) rough estimate would be something on the order of 5% of one's research budget, but possibly higher if a significant upgrade is immediately needed.
To get a more meaningful cost number, you will need to define the scope of the cost calculation. Example:
Often times, the infrastructure costs are shared among many labs in a centralized compute facility so it will depend on how many people are sharing the hardware. If you put data on the cloud, it will depend on how long that data must be available / accessible.