Why Should I Use Galaxy ?
11
30
Entering edit mode
12.4 years ago

Don't get me wrong, I'm strongly convinced that Galaxy is a powerful tool to manipulate data interactively.

But as a trained bioinformatician who have a deep knowledge of the linux command line, do you consider that it is only a tool for the "biologist/end-user or do you actually use galaxy for your every day data-manipulations (why?).

We're about to install galaxy here, and some colleagues believe that it will be used from 'A' to 'Z' to make our NGS analysis. Why not ? But why should I spend some time in front of my web browser when I can write a shell-script (and save it into git).

Pierre

UPDATE 2017-04

galaxy workflow • 22k views
ADD COMMENT
4
Entering edit mode

People always ask me if I use galaxy. I cannot see a reason that a bioinformatics practitioner needs Galaxy. I would love to know how I am wrong.

ADD REPLY
2
Entering edit mode

I agree with Zev. We have a Galaxy installation, but I cannot really think of anyone who is actually using it. Last time I checked there was still no versioning support, which I think is quite important when creating pipelines/doing analysis. Maintaining your workflows within Galaxy is also tricky, whereas archiving scripts in a repository seems almost natural.

ADD REPLY
4
Entering edit mode

I've been maintaining a local galaxy-dist installation for our lab, and while I agree with all the points about reproducible and sensible workflows, the one thing that really bothers me about Galaxy is that every input and output is a file. There is really no way to process data as a stream - e.g. piping the output of one process into the next part of the pipeline. For big data processing this creates unnecessary delays and large datasets that must be cleaned up periodically.

ADD REPLY
2
Entering edit mode

Reproducibility does not come without cost. As many others, I also use Galaxy for teaching. Students benchmark different segments of Galaxy and learn how to design an efficient processing pipeline for large data from a scratch. The course is called "Architecture of large bioinformatics systems" but maybe I rename it to "Tradeoffs of reproducibility".

ADD REPLY
0
Entering edit mode

Bonjour Pierre, I want detect lncRNA from some RNA-seq data (fastq format), as you know I should do several step: align reads, provide a GTF file, annotation,diff-exp. how Galaxy can help me?please give me some info because I am new in this web-tool.

because I don`t anything about Linux,Galexy is good for me.

Merci de votre aide

ADD REPLY
19
Entering edit mode
12.4 years ago

Our group has implemented Galaxy for two reasons:

1) We are hoping to improve the documentation of our workflows, so that it is clear who ran what script, what version of what script, and what the output was. This (we hope) should improve reproducibility and we are anticipating it will help to address reviewer comments that bemoan the "data was analyzed using local scripts" sentence many of us may have written into our methods. This is probably the main reason a bioinformatics group might begin using Galaxy. Of course, there are other solutions to this -- but Galaxy is just one of those solutions.

2) Allows non-computational investigators in the group to begin learning NGS data analysis without first taking 4-6 months to learn Linux at the command line. But not all groups will have this need -- we happen to have this need. So, it will probably be group dependent.

And a third reason, I have to add, the Galaxy development team are an excellent group of professionals to work with.

ADD COMMENT
0
Entering edit mode

Good point on documentation.

ADD REPLY
11
Entering edit mode
12.4 years ago

Do you remember "Works on my machine" certification program [1]? Galaxy provides an environment for reproducibility of a workflow. If I take your shell script, I need to make sure I have the same libraries, environmental variables, paths, etc. But I can take your Galaxy workflow, assuming you have a reasonably standard Galaxy instance, and expect it to run it on my data without thinking about compatibility.

What about an attempt to reproduce certain workflow from 10 years old paper? Even if you have the source code, it's almost 100% chance it is not going to work. Galaxy is a substitute of a virtual machine.

[1] http://www.codinghorror.com/blog/2007/03/the-works-on-my-machine-certification-program.html

ADD COMMENT
3
Entering edit mode

This is an honest question: What do you think the chances are that a Galaxy pipeline from today will work again in 10 years? Who knows what the computing (desktop/laptop) world will look like in 10 years?

ADD REPLY
10
Entering edit mode
12.4 years ago

I think the main reason for a power user like yourself to use it would be that you could actually contribute to Galaxy itself. In that way your power tools could come available to all these non power users and you could help to clean up that overfilled tools shed by removing things you see in there that are not as good as other things. (according to ISMB talks there literally are thousand of tools in that shed).

ADD COMMENT
6
Entering edit mode
12.4 years ago

Command-line savvy power users are not its target audience, and I don't see any reason why you and I should be using it. Just like most GUIs, it fills an important niche for non-computational types, but is limiting to a power user.

ADD COMMENT
6
Entering edit mode
12.4 years ago

In spite of my CLI knowledge, I use and appreciate galaxy (the JGI implementation) mainly for the workflows. As a more general thought, I really do think that being a bash and command-line expert is a real advantage as a bioinformaticist, however when something good happens in the GUI world I won't throw it away. For instance, who does still use pine or lynx? ;-)

ADD COMMENT
0
Entering edit mode

Lynx ? Real hackers use curl ;-)

ADD REPLY
4
Entering edit mode

lynx predates curl by ~5 years :-)

ADD REPLY
0
Entering edit mode

Lynx was my browser of choice (well there really wasn't a choice) for years

ADD REPLY
4
Entering edit mode
12.4 years ago

For me the main reason to #usegalaxy is in terms of training students and dealing with collaborators. First, galaxy allows you to teach bioinformatics separately from computing (e.g. UNIX, programming). Second, Galaxy allows you to reduce the time consuming back-and-forth of communicating methods/results between you and students/collaborators. I estimate that somewhere upwards of 60% of a project can be simply communicating methods and results - sharing histories reduces this drastically. Third, by providing constraints on what a user can do, it prevents a naive user from tearing down a machine/cluster, so you spend less time f-ing around on system administration with a newbie, which puts many of them off from being self-sufficient. Fourth, it establishes a best practice environment for learning bioinformatics - eg. Galaxy allows you to teach concepts of workflows easily & code organization/testing/documentation when developing local tools. Finally, by having a new users start with Galaxy and learn its limits for themselves, they quickly become motivated to take their training wheels off and learn how ride for themselves. In sum, Galaxy is worth adopting for you to be able to get non-trained biologists to be able to help themselves, which gives you more time to focus on the real science, rather than being a bioinfortechnician. As I say to many people: "Use galaxy, it's out of this world"

ADD COMMENT
1
Entering edit mode
11.1 years ago

Other points on this thread are key but I'll add my experience as a core facility manager. Apart from the obvious advantage of allowing inexperienced computer users a GUI environment, an insta-gui for new command-line scripts and programs (that automatically plays nice with other programs) enables rapid prototyping. Thus the user can play with options that allows them to import results easily into other programs and workflows which may even be set up by the core beforehand.

Also the library feature is useful, allowing shared data to selected groups.

ADD COMMENT
0
Entering edit mode
11.1 years ago

For me Galaxy is mainly used to do some manual jobs like intersect my regions of interest with genome tracks from UCSC. In bioinformatics you really need to control your data by manually looking into it time-to-time, thats why GUI tools are useful. I do this during downstream functional analysis, and I believe it is the easiest way in most cases. However, I do not use it to pipe data from raw reads to filtering/mapping and so on (despite there is a nice workflow system implemented in Galaxy), as I feel running shell scripts is more stable.

ADD COMMENT
0
Entering edit mode
11.1 years ago
Tonyzeng ▴ 310

I do get benefits by using Galaxy by doing some training with out any bioinformatics experience and computer basic skills at all, especially when I could not get access to the high powerful computer but my personal computer. At least I got a general whole picture of pipeline in my head from beginning like reads quality control until SNP calling. '

Now, since I get access to powerful computer and know some operations of Linux and sequence softwares, pre-trainning experience by Galaxy really give me some ideas at least that I know how to do for the next steps using some protocols provided by many guys from sequence communities and make me clear that what is the wrong or right of the out put files like...

ADD COMMENT
0
Entering edit mode
7.8 years ago
ropolocan ▴ 830

In my experience with some instances of Galaxy, often while using it you will find interesting tools that you want to adopt for your command line workflows. In those cases, then you can go to the toolshed and clone the repository with mercurial. The tool in question might be a script that has been wrapped for usage in Galaxy, in which case it the repository will also have an XML wrapper file. Hopefully, the script is well documented so that you can use it locally.

ADD COMMENT

Login before adding your answer.

Traffic: 2073 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6