Entering edit mode
7.5 years ago
CY
▴
750
I am new to clustering computing.
I was trying to run my pipeline in EC2 cluster. However, that is not real distributed computing, right? The tasks are not being distributed. It is just running running the same or different separately on each node.
So why distributed computing is rarely used in bioinformatics? Is it because most of the current bioinformatics tools do not support distributed computing?
Can anyone share some ideas on this? Really appreciate!
Citation needed Where did you get this information?
Some of the best known examples of truly distributed computing are bioinformatics ones. Specifically I'm thinking of Folding@Home and Phylo.
A freshly developed piece of software that I can think of (because we use it) is hail. I don't see where OP gets their working assumption.
Oh yeah, I forgot about the massively distributed ones.
Distributed programs tend to be a pain to write and debug. Then, once they are written... they're often less portable and harder to install and run than single-node software, which immediately eliminates most potential users.
Bioinformatics applications are often not very compute-intensive, and more I/O intensive, so distributed versions offer little advantage. One primary reason for distributed computing in bioinformatics is often to escape the memory limitation of a single node (particularly in assembly). This is much harder to do efficiently than distributing compute-intensive problems. But there are several distributed assemblers out there, such as Omega/Disco, Ray, and HipMer. They can be substantially less efficient than single-node assemblers (or themselves when run on a single node), but may be able to assemble things that simply won't work on a single node, and often have a faster wallclock time (even extremely fast) with sufficient nodes.
For alignment, which can sometimes be slow, it's pretty trivial for the user to split input reads into a bunch of files, align them on different nodes, and merge the output together. That's distributed computing the easy way!