Question

Forum:pros- and cons- : programming skills vs. GUI

6

Entering edit mode

8.9 years ago

TriS ★ 4.7k

I was asked to give an introductory lecture about bioinformatics in cancer research and I wanted to spend one slide or two to talk/compare the pros and cons of coding (i.e. perl, python, R, Java, UNIX...) vs. using GUI tools (i.e. galaxy, cBioportal etc..)

the reason is that most of the students are enrolled in a genetics program with little or no bfx knowledge and they are "scared" of learning how to code, learning statistics etc etc..so I wanted to explain that 1) bfx is not all coding but there are a number of analysis that also can be done without writing lines and lines of code and 2) although there is a steeper learning curve in coding, it is extremely powerful.

there are a few posts online that touch the subject but I wanted to hear what the thoughts were here and, beside the obvious reasons, what do you think should be the most important messages to be conveyed to grad students.

programming lecture • 8.2k views

ADD COMMENT • link updated 19 months ago by Ram 44k • written 8.9 years ago by TriS ★ 4.7k

1

Entering edit mode

Just to note that some applications do need GUI, such as phylogenetic tree viewing/editing, alignment viewer (IGV), genome browser, assembly viewer (consed and Bandage), network visualization, etc. For these, a well-thought and well-implemented GUI is essential.

ADD REPLY • link updated 3.6 years ago by Ram 44k • written 8.9 years ago by lh3 33k

0

Entering edit mode

yes, here GUI is essential, however, these tools still require upstream work that could suffer because of the limitations in other GUI/pre-canned analysis tools

ADD REPLY • link updated 3.6 years ago by Ram 44k • written 8.9 years ago by TriS ★ 4.7k

0

Entering edit mode

Well, CLI is essential, however, CLI tools still require upstream wetlab work to generate data. It is not necessary for everyone to know everything. Occasionally, when you work on specific areas, even GUI alone can be ok. That is how CLC etc have survived for years.

ADD REPLY • link updated 3.6 years ago by Ram 44k • written 8.9 years ago by lh3 33k

0

Entering edit mode

I like Istvan's analogy!

Another limitation of GUIs (I'm thinking Galaxy) is that you're stuck with the older versions of software tools that are integrated into the interface. However, there are a number of fairly routine workflows (differential gene expression, ChIP-Seq peak calling) where the limited GUI vocabulary may suffice.

ADD REPLY • link updated 3.6 years ago by Ram 44k • written 8.9 years ago by harold.smith.tarheel ★ 5.0k

1

Entering edit mode

The problem with routine workflows is that they can only solve "routine" problems - and it is almost impossible to tell that beforehand when is a problem of a new class or the same old.

ADD REPLY • link updated 4.9 years ago by Ram 44k • written 8.9 years ago by Istvan Albert 101k

0

Entering edit mode

But some problems ARE routine, and it's not impossible to anticipate the outcome. For example, if the goal is to identify a list of mouse genes whose expression levels change the most in response to drug treatment, then a Galaxy workflow of Bowtie/Tophat/Cufflinks/CuffDiff would be adequate. Sure, there are more sensitive/sophisticated/powerful/flexible tools for the job, and this pipeline is likely to miss some candidates, but that may be okay for the user.

ADD REPLY • link updated 3.6 years ago by Ram 44k • written 8.9 years ago by harold.smith.tarheel ★ 5.0k

2

Entering edit mode

well years ago we had the paper that proved that RPKM is inconsistent across samples see

http://blog.nextgenetics.net/?e=51

Three years later people still use RPKM because that's what Cuffdiff implements. Now obviously every routine analysis using cuffdiff will be wrong because the units themselves are badly defined. That is before considering the actual biology or the many confounding factors. The units themselves are incorrect, how absurd is that? The question is how wrong are they? It all depends on the diversity of transcripts, if there are many new transcripts the values are fatally wrong. If there are no new transcripts RPKM will work. So now the validity of the routine analysis depends solely on the number of transcripts that express only in one condition.

ADD REPLY • link updated 4.9 years ago by Ram 44k • written 8.9 years ago by Istvan Albert 101k

0

Entering edit mode

Istvan, I'm aware of the limitations of RPKM (which is one of the reasons I don't use the Tuxedo package). I also agree wholeheartedly that CLI is preferable to GUIs for the many reasons cited. But the example I gave still holds. The mouse transcriptome is well-studied, so it's highly unlikely that drug treatment will produce a host of novel transcripts. Despite its flaws, RPKM would identify some subset of the most differentially expressed genes. If that's the user's only objective, I don't see the problem.

ADD REPLY • link updated 3.6 years ago by Ram 44k • written 8.9 years ago by harold.smith.tarheel ★ 5.0k

0

Entering edit mode

Just to clarify - it is not about novel transcripts - the problems arise when there are transcripts or isoforms that can be found in one sample but not the other.

I don't disagree that pipelines "work" - it just never clear how well they do and when they cross from "kind of right" to "no that's obviously not right". The more automated and "routine" a process the less likely one investigates it (but this true regardless of the approach command line or GUI).

ADD REPLY • link updated 4.9 years ago by Ram 44k • written 8.9 years ago by Istvan Albert 101k

0

Entering edit mode

I meant 'novel' in the sense of 'unique to one sample', which is the condition that you describe.

And I strongly agree that the user needs to understand the tool, be it CLI or GUI. Caveat emptor.

ADD REPLY • link updated 4.9 years ago by Ram 44k • written 8.9 years ago by harold.smith.tarheel ★ 5.0k

1

Entering edit mode

You are never stuck in Galaxy. It's OpenSource and it's pretty easy to update tools or point the big community to update tools. Actually, for a few tools we have wrappers before the paper comes out, because more and more people talking to the Galaxy community and contributing to it during the publishing process.

I guess this is just a matter of time and priorities. If someone spends the time in compiling a new version of tool X and integrating it in their own make file rather than integrating it in Galaxy it will take longer for all of us :)

ADD REPLY • link updated 3.6 years ago by Ram 44k • written 8.9 years ago by Björn ▴ 670

0

Entering edit mode

Sorry, I should have specified the public Galaxy site at PSU. Given that the OP was addressing a class with little/no CLI expertise, I assumed that updating the tools would be beyond their skill level. Plus, I'm fairly certain that you can't update Galaxy using only the GUI...

Note that this post is in no way a criticism of Galaxy. I think it's a very useful suite of tools, it lowers the activation barrier for learning bioinformatics, and the automatic tracking of workflows is a strong selling point.

ADD REPLY • link updated 3.6 years ago by Ram 44k • written 8.9 years ago by harold.smith.tarheel ★ 5.0k

0

Entering edit mode

You can update Galaxy tool versions using only the GUI (and Björn provides a Docker container with Galaxy that makes the set up vastly simpler) :) Granted, you only get what's in the toolshed, but that's sufficient 99% of the time.

ADD REPLY • link updated 3.6 years ago by Ram 44k • written 8.9 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks for the clarification, Devon. I should have been more precise. By updating, I was referring to Björn's comment about writing wrappers for the latest versions of tools. I consider the Tool Shed part of Galaxy proper, so of course it's possible to use the GUI to access versions contained there.

ADD REPLY • link updated 4.9 years ago by Ram 44k • written 8.9 years ago by harold.smith.tarheel ★ 5.0k

Ram · Answer 1 · 2015-11-30

The important thing here is to allow people to recognize that each is a spectrum: chaining command line tools together is far easier than writing a program in a programming language. At the same time a GUI based program may require the user to make use of programming concept. For example entering formulas into an Excel worksheet - that is programming. So the divide is far less sharp than many might think.

My 2 cents to the matter is that graphical user interfaces are akin to speaking by using only a limited number predetermined sentences. You are given a limited amount of choices and and as long as what you want to say can be expressed with them it "seems" easier. But soon one runs into not being able to express what they really wanted to say.

Command line programming is like free speech, you can express far more detailed thoughts but at the same time you can say complete nonsense with ease.

Ram · Answer 2 · 2015-12-02

For our teaching courses, and from your question I think we are targeting the same audience, we try to take the fear from our students. For this we train them in Galaxy and demonstrate how easy reproducible science and HPC computing can be. If you can make the point that everyone can reproduce (or not ;)) a Nature/Cell paper with Galaxy, you can motivate your students a lot without scaring them away with complicated installations and command line hacking.

Galaxy is very much like a shell pipe and to understand the concept of chaining together simple, already existing commands to complex workflows is a key message I guess.

If you are afraid of restricting your users and taking away freedom, don't worry any more. We have integrated IPython and RStudio in Galaxy.

So you now can combine both worlds. Shiny workflows and free speech. Isn't this what we are aiming for? Enable every researcher the take advantage of his/her skills without restricting them?

P.S. We using Galaxy/IPython to train programming lessons with real life-science data. If you are interested in teaching material let me know or have a look at: https://github.com/bgruening/

Ram · Answer 3 · 2015-11-30

Coding gives you flexibility. With GUIs you're limited to the options offered. When GUIs become very flexible then they can reach a level of complexity close to that of programming, with an equally steep learning curve. In the end, even for using GUI tools, one ends up coding if only just to assemble data or convert data to a format acceptable by the GUI tool. Also, GUIs take time to develop so they are only available for established and/or well-funded tools. When you're in a lab exploring new avenues, you often need to develop your own tools (or adapt existing ones). If you can't code, then you're limiting your options. Given the importance of computational and statistical tools in modern biology or science in general, I don't see how one could be a scientist (or at least have a lasting career in science) and not have some basic skill/understanding in these areas. In my view if someone is not willing to learn the tools of the trade, then they should reconsider their career choices.

Ram · Answer 4 · 2015-12-02

I often try in some intro courses to stress the importance of learning a little bit of UNIX/Linux and the command-line, or push all wet-lab people towards OS X and learning a little command-line so they can have the bets of both worlds. Because really, if you are scripting or programming with a fairly standard language (Python, Perl, C, C++, Java) you won't really be able to much that is terribly effective in Windows and even there they would need to learn the Windows command-line to do anything remotely interesting. And Windows is terrible for bioinformatics analyses in general.

Learning just a little bit of the command-line opens up all of the command-line tools that exist. Even if they are running their workflow manually they are one step ahead. I think it is a shorter jump from there to programming (at least light scripting).

Ram · Answer 5 · 2015-12-01

1

Entering edit mode

8.9 years ago

5heikki 11k

IMO the most important message is that putting a few months into learning the command-line will enable them to do stuff that would be simply impossible to achieve through a GUI during a human lifespan. A simple example would be blasting 1 million protein sequences against nr. Imagine how long it would take if you did it one by one through the NCBI web service GUI (OK, maybe they allow more than one sequence, but certainly not 1 million).

ADD COMMENT • link updated 3.6 years ago by Ram 44k • written 8.9 years ago by 5heikki 11k

0

Entering edit mode

Remember that the OP is giving an introductory lecture to a group of geneticists, most of whom have zero command-line expertise. If, on the basis of one lecture, s/he can convince even a single student to embrace the power of CLI, it would be nothing short of miraculous!

ADD REPLY • link updated 3.6 years ago by Ram 44k • written 8.9 years ago by harold.smith.tarheel ★ 5.0k

0

Entering edit mode

The issue is that for all intents and purposes, getting more wet-lab folks to learn a little bit of command-line would be far more useful in the long run than programming. I'll expand on this in my own answer as it is a bit long for a comment :)

ADD REPLY • link updated 3.6 years ago by Ram 44k • written 8.9 years ago by DG 7.3k

0

Entering edit mode

I think for as far as this lecture goes I won't have neither time nor space to teach them any command line/programming but I can def try the "miraculous task" of convincing them to learn some programming language/stats on the side.

but I still do agree that if they spent some time learning, they would at least have a better understanding on how analyses work and how to replicate/apply/explore new analytical avenues for their own research.

ADD REPLY • link updated 3.6 years ago by Ram 44k • written 8.9 years ago by TriS ★ 4.7k

Ram · Answer 6 · 2015-12-02

A strong point in favour of scripting is that a job accomplished with a script is reproducible and self-documented (it's called script for a reason...!). This makes it easier in the future to understand what you have done. Sure you can write very obfuscated code, but still better then GUIs where you have no track left^*. In this perspective, the task at hand doesn't have to be very sophisticated or "high throughput" to justify scripting over GUI, it can be as simple as renaming a column in a data file.

On the other hand, as pointing out before, GUIs are great for data exploration and to generate hypothesis. Think IGV, but also a quick look at a table in Excel can tell something interesting for further analyses. So there shouldn't be a competition between the two, really.

My 2p...

^* As far as I know Galaxy is pretty good in recording the executed commands, so it's a notable exception.

Ram · Answer 7 · 2015-12-02

0

Entering edit mode

8.9 years ago

nathaniel.echols ▴ 30

For an audience of non-experts, absolutely focus on basic concepts and the GUI. I come from a different background (protein crystallography) but I spent several years going to workshops and training scientists in the use of the software I helped develop. At the beginning we just taught command-line tools and it was torture - at least half of any given audience had almost no command-line experience and we would waste time going over the basics. And we weren't even trying to teach them programming, just basic use of Unix commands. Once we had a GUI available, the difference in what material we could cover and what the students could absorb in the relatively short time allotted was enormous.

Of course this depends on the availability of a decent GUI; I have no experience with Galaxy so I can't comment on that. But I do not think that the average biologist will become a better scientist by learning Linux command-line use; they would be far better off learning math and statistics instead.

ADD COMMENT • link updated 4.9 years ago by Ram 44k • written 8.9 years ago by nathaniel.echols ▴ 30

1

Entering edit mode

But I do not think that the average biologist will become a better scientist by learning Linux command-line use; they would be far better off learning math and statistics instead

I agree with the part of the statement where you say that users/students/scientists should learn math and statistics...however, I don't think I ever met a statistician or a mathematicians who does not have programming experience. I think that if you start off with math and stats, programming comes with the territory. if instead you start from the other side (biology) then programming becomes an add-on that you can learn while improving your math and stats skills...but I do see those two as being intertwined at least as far as R/Mathlab/Mathematica programming goes.

regarding Linux/UNIX I do believe that it is still connected. i.e. when I analyze data I often use our UNIX servers since I can't really mange big fastq files on my PC/Mac, and if I didn't know some UNIX command line/programming then I wouldn't be able to run my analyses.

ADD REPLY • link updated 4.9 years ago by Ram 44k • written 8.9 years ago by TriS ★ 4.7k