Question

Is Galaxy Ok For Speed?

0

Entering edit mode

12.1 years ago

corn8bit ▴ 140

I'd like to use Galaxy for my cluster pipelines. It should make it easier for less tech savy team members to run pipelines.

It looks like Galaxy starts ALL processes inside of a python wrapper (when running top I see python instead of bwa). Will this be a speed issue? Speed is important for me, and I need to use all (many) threads effectively.

Why oh why does Galaxy start things in python wrappers? Will this hurt my speed?

Additional data:

I'm currently doing tests myself and have searched for this question. I apologize if I missed the answer. I also know that Galaxy duplicates intermediate data, but HDD reads aren't a bottleneck for me so this is no problem and I'll automate the deletions later. This question is CPU targeted.

galaxy • 3.5k views

ADD COMMENT • link updated 12.1 years ago by Dan D 7.4k • written 12.1 years ago by corn8bit ▴ 140

score 4 · Answer 1 · 2013-06-25

4

Entering edit mode

12.1 years ago

Björn ▴ 670

Hi,

Galaxy is not running everything in python wrappers. Most of the wrappers are bash-like scripts. However, a few of them are, but this is not a speed limitation. All what these wrappers are doing is abstracting the inputs and outputs (tempfiles etc.). In that case the program is usually invoked through subprocess, so there are no speed issues. Btw. deletion of intermediate data can also be handeled by galaxy and you do not need to care about it.

Hope that helps,

Bjoern

ADD COMMENT • link 12.1 years ago by Björn ▴ 670

0

Entering edit mode

Thanks, that's good to know and saves me a lot of time. I'm glad that this is the case. It makes much more sense! What confused me as well is the "load balancing" documentation that also makes it sound like an issue. They must be talking about for 100+ users at a time.

ADD REPLY • link 12.1 years ago by corn8bit ▴ 140

2

Entering edit mode

Correct. The Galaxy application itself is subject to the Python Global Interpreter Lock. You can bypass this by specifying multiple instances. It's a little tricky but definitely doable, and you definitely won't need to do it until you regularly have multiple simultaneous users.

ADD REPLY • link 12.1 years ago by Dan D 7.4k

score 3 · Answer 2 · 2013-06-25

The main Galaxy server gets a lot of use. So, I would consider it slow due to the number users.

This is why some institutions set up their own galaxy mirror (where user access can be limited, decreasing the total number of users). If you had a local mirror, you could benchmark NGS tasks and definitely see a difference. I wouldn't consider speed a problem for a mirror installation.

score 3 · Answer 3 · 2013-06-25

3

Entering edit mode

12.1 years ago

Dan D 7.4k

I've deployed a local installation of Galaxy on a cluster. If you examine the Python wrappers carefully, you'll see that they're constructing and then executing a command line. Thus the tools they're wrapping are not subject to the Python Global Interpreter Lock. Galaxy won't run tools any slower than they would run on a pure command-line execution if you're submitting jobs to a cluster.

Galaxy also includes scripts to automatically delete datasets according to parameters you specify. More information here.