Question

Distributed / parallel computing in bioinformatics

0

Entering edit mode

7.7 years ago

CY ▴ 750

I am new to clustering computing.

I was trying to run my pipeline in EC2 cluster. However, that is not real distributed computing, right? The tasks are not being distributed. It is just running running the same or different separately on each node.

So why distributed computing is rarely used in bioinformatics? Is it because most of the current bioinformatics tools do not support distributed computing?

Can anyone share some ideas on this? Really appreciate!

distributed-computing • 1.6k views

ADD COMMENT • link updated 22 months ago by Ram 45k • written 7.7 years ago by CY ▴ 750

3

Entering edit mode

So why distributed computing is rarely used in bioinformatics?

Citation needed Where did you get this information?

ADD REPLY • link 7.7 years ago by Ram 45k

0

Entering edit mode

Some of the best known examples of truly distributed computing are bioinformatics ones. Specifically I'm thinking of Folding@Home and Phylo.

ADD REPLY • link 7.7 years ago by Joe 22k

0

Entering edit mode

A freshly developed piece of software that I can think of (because we use it) is hail. I don't see where OP gets their working assumption.

ADD REPLY • link 7.7 years ago by Ram 45k

0

Entering edit mode

Oh yeah, I forgot about the massively distributed ones.

ADD REPLY • link 7.7 years ago by Brian Bushnell 20k

1

Entering edit mode

Distributed programs tend to be a pain to write and debug. Then, once they are written... they're often less portable and harder to install and run than single-node software, which immediately eliminates most potential users.

Bioinformatics applications are often not very compute-intensive, and more I/O intensive, so distributed versions offer little advantage. One primary reason for distributed computing in bioinformatics is often to escape the memory limitation of a single node (particularly in assembly). This is much harder to do efficiently than distributing compute-intensive problems. But there are several distributed assemblers out there, such as Omega/Disco, Ray, and HipMer. They can be substantially less efficient than single-node assemblers (or themselves when run on a single node), but may be able to assemble things that simply won't work on a single node, and often have a faster wallclock time (even extremely fast) with sufficient nodes.

For alignment, which can sometimes be slow, it's pretty trivial for the user to split input reads into a bunch of files, align them on different nodes, and merge the output together. That's distributed computing the easy way!

ADD REPLY • link 7.7 years ago by Brian Bushnell 20k