Hello,
Several questions pertain to hardware purchases. But its difficult to know whether they are still up to date.
Is there a wiki somewhere that summarizes some Big Ass Server (BAS) related info that is specifically relevant for Bioinformatics?
There must be FAQs such as:
which processors are better for BLAST/for bwa?
if you have something like DELL R815, can each processor access all of the RAM?
Giving some current prices of BAS and storage solutions that people have purchased.
I couldn't find anything. Perhaps a "community wiki" here is a good place to start? (or a google spreadsheet)
Cheers,
Yannick
R815 has shared memory, which means all CPU cores access all the RAM. My limited experience is Intel CPUs are usually faster than AMD at the same clock speed (their L2 cache are not really comparable), but AMD is cheaper in terms of performance per unit cost.
I just purchased a 108TB server (only 4U!) primarily for backups. The cost was 18k uk pounds from DNUK.
A similar question on compute servers/workstations was asked here and here. The issues of Amazon cloud Vs cluster Vs workstations are also touched on.
To answer your questions: All processors should be able to access any of the RAM. I have seen no noticeable performance differences between processor branding. More cores and L2 cache are always good and all extra cash should be put into RAM. At the end of the day your budget will be the limiting factor.
You're right, I think there is a definite need for a hardware dork to write a blog exclusively about Big-Ass Servers™ in bioinformatics. I imagine IBM and Dell would happily keep it plastered with ads.
With regard to storage, we have been using a NAS sold by Isilon. Although I have recently experienced some odd quirks with this system, it is scalable to 15 petabytes - enough to keep you going until next year.
The BAS post is not just about processing time and benefits of shared memory, but also human time, which is far more expensive. Since I wrote that post we have had more and more issues with our cluster (endless image updates, ABI's new software doesn't work on it) and more good experiences (RStudio Server, gnu parallel) with our BAS that only confirm my convictions. For a small bioinformatics core, clusters look cheaper on paper, but in reality they are cheaper only if your time is worthless.
Disregarding human resources, I am afraid that I do not buy most of the technical reasons in your post except for de novo assembly. The human time argument is mainly true when you do not keep the machine busy all the time. When the CPUs are always busy, a big fraction of the time spent on waiting CPU resources goes to human time as well. BAS may save human time, but does not always save even for a small group.
This blog post, which lh3 and others just shared on Twitter, reported the most efficient instance type for an alignment was an m6g.8xlarge (which has 128GB)
R815 has shared memory, which means all CPU cores access all the RAM. My limited experience is Intel CPUs are usually faster than AMD at the same clock speed (their L2 cache are not really comparable), but AMD is cheaper in terms of performance per unit cost.
EDIT: I just read the post. In my view, BAS is ONLY needed for de novo assembly of mammalian genomes. For the rest, BAS is convenient, but a waste. You can get much more CPU power at the same price by having a cluster. Even in terms of convenience, BAS does not have a big margin. At least in NGS analysis, embarrassingly parallel works very well for most tasks, except de novo assembly of course.
I (still) love this acronym but now that I think of it perhaps it should be
BASS
ok, i will inform the media
R815 has shared memory, which means all CPU cores access all the RAM. My limited experience is Intel CPUs are usually faster than AMD at the same clock speed (their L2 cache are not really comparable), but AMD is cheaper in terms of performance per unit cost.