Best Language For Introductory Programming Course From Within An Introduction Course On Bioinformatics.
15
8
Entering edit mode
13.4 years ago

What language would you recommend to introduce programming to an audience of biology/life science students at a bachelor level?

In our introductory course Bioinformatics we currently use perl as the teaching language to introduce life science students to the concepts behind programming. Due to a change in the curriculum we need to reassess the structure. Also the number of students will be increasing to a number where continuing with perl practical sessions could become too labour intensive.

On stack overflow I saw an almost duplicate question. There are some really nice answers given there. My question here boils down to: Are the answers at the stack overflow question also applicable in a course for an audience of bioinformaticians/biologists?

programming subjective • 13k views
ADD COMMENT
3
Entering edit mode

I'm surprised no one asked you so far: how many hours is the course? That might have a huge impact on the suggestions (teaching proper programming vs. some quick hacks).

ADD REPLY
2
Entering edit mode

Ohh and the main purpose of this course is not to learn them programming. But to make them understand enough about what programming can do so they will ask somebody for help or decide to learn when appropriate.

ADD REPLY
0
Entering edit mode

Here is a relevant thread from Ask Metafilter, in which I recommend Python.

ADD REPLY
0
Entering edit mode

Absolutely, that is in fact why we do Perl now. It is just a very quick introduction. Total contact hours 6, total workload about 20.

ADD REPLY
0
Entering edit mode

Then you might also want to consider http://www.taverna.org.uk/, which makes you get used to train of thought without any actual programming.

ADD REPLY
1
Entering edit mode

Actually with Taverna you're doing visual programming. Most people only learns 2 or 3 textual programming and they innocently recommends them everywhere.

ADD REPLY
0
Entering edit mode

You should choose the language with less semantic gap for your audience. If you want your students copy and paste code, then you may teach them any language, if you want to make them to think, then choose a language which doesn't bother you. Don't buy problems for free. Choose the language with less keywords, less syntatic sugar, less unnecessary concepts to learn. Research to find the most human-oriented language.

ADD REPLY
23
Entering edit mode
13.4 years ago
Konrad ▴ 710

As mentioned by many here and at Stackoverflow I would recommend Python (often called "executable pseudocode") especially as a first language due to different reasons:

ADD COMMENT
2
Entering edit mode

Here is a relevant thread from Ask Metafilter, in which I also recommend Python.

ADD REPLY
1
Entering edit mode

Another big advantage to Python is interactive plotting with matplotlib (http://matplotlib.sourceforge.net/). On top of being just plain useful, incorporating it into a course could be beneficial for students who are visual learners.

ADD REPLY
0
Entering edit mode

Where are the warrants that supporting many programming paradigms is better that to support just one and support it well enough? Even programming in Assembler or Lisp can be fun, and believe me Lisp is way more used in scientific community (or don't believe me and make a search of Lisp in ieeexplore). Besides, most mainstream languages are free and open-source and have thousands of teaching material, no more a remarkable point. It's incredible how people buy technology without serious or formal training in computer science.

ADD REPLY
0
Entering edit mode

The advantage of having several paradigms in once language is that it makes it easy to present those without switching between different languages. I made the point regarding free/open source as there are still many educational institutions which are using MatLab and later students recognize that the license fees are significant (I have real cases in my research environment where people asked me to re-implement programs as they brought old MatLab programs from their previous labs and did not want to buy MatLab licenses from their own budget). Yes, there is Octave but the compatibility is not 100% when you more complex stuff. PS: Don't take that the "It's fun" statement too serious. ;) The question was about introductionary languages. Easy obtained results make people happy and motivates them to continue - Python definitely offering this. Onces they have the basics they can start with LISP, Assembler, LOLCODE, brainfuck or whatever they want and can have a lot of fun with that, too.

ADD REPLY
0
Entering edit mode

It seems you've assumed that just one language can be enough for many paradigms. Paradigm shift involves language shift (any book of history of science supports that claim). Pretending one programming language to fit many paradigms is an illusion, sorry for the bad news ;). Introductory language is one of those made for that purpose, like Pascal, Scheme or Smalltalk, because they include few or clear concepts. Python was a language created for interfacing the Amoeba OS because its developer didn't was very proficient in Bourne shell scripting. Good point about MatLab by the way.

ADD REPLY
0
Entering edit mode

Well, he solved a personal problem and invented a elegant language (or better said modified another one heavily) - fine for me :). And yes, you are right - if you want to dive deeply into a paradigm you really have to switch to a language that really embraces it. But as said - the question was about an introductory language for beginners who want to get stuff done not about how to fill a curriculum of a computer science student. Start with Python and keep your mind open was implied the bold font of "first language" in my recommendation. (PS: I can highly recommend "Seven languages in seven week" by Bruce A. Tate for anyone who likes to get a taste of different paradigms in a very entraining manner.) But actually Python is great introductory language that can be properly used in practice (not the case for Pascal IMO; Scheme I actually really used only for training purposes so far). Anyway, if you have a different feeling about this topic you can simply propose your mentioned languages as solution here. I don't have hard numbers on the educational efficiency of different languages available so maybe other share your opinion.

ADD REPLY
17
Entering edit mode
13.4 years ago
Agapow ▴ 270

There was a lengthy discussion on this at LinkedIn, and I'm going to largely repeat my comments from there. You'll get a lot of different opinion on this because:

  1. it's a religious issue (i.e. comes down a lot to subjective judgements and personal experience), and
  2. there's a lot of possible considerations for language choice in bioinformatics courses: teachable to people who aren't just going to be programmers and may not have programmed before, has a lot of useful libraries, has a community behind, good for quick and dirty / one off scripting solutions, useful for web development, etc.
  3. What "bioinformatics" means to one person and another can be quite different. I'm a bioinformaticist, you're a computational biologist, you're a genomicist and you just do a few stats ...

So a few thoughts about different languages:

Old school compiled languages, e.g. C/C++: No. Learning curve too high, no good for quick-and-dirty problems, weak in web development. Relatively little bioinformatic work happening here. Not a good place to start.

Java: Lots of libraries and BioJava is pretty damn good. But it's not a great first language, and always feels a bit "heavy" when I'm trying to do solve a small problem. Still, I expect to see a lot of development in this area with the JVM enabled languages like Jython, JRuby, Groovy, where you can script and still use the Java libraries. Not for novices.

Perl: was the undisputed choice for bioinformatics 10 years ago but that lead has evaporated. Quirky, opaque and write once. The whole Perl 6 morass doesn't help. I think you can do better. Still, there's a lot of code here and a lot of the older significant tools are written in this (e.g. GBrowse etc.)

Ruby: I've got a love-hate relationship with Ruby. There's a lot of Good Stuff there, and the web development is excellent. People seem to like learning Ruby too. But there are a few quirks in the language and BioRuby is still a work in progress. Still, a lot of enthusiasm here.

Python: this is where the weight of attention is. BioPython has really come along in the last few years and many of the newer, excellent tools (e.g. Galaxy) are written in it. Easy to learn, kind to beginners, big community, good scientific computing support (IPython, NumPy, etc.). There's an odd aspect or two I wish was developed more (I'd really like anonymous closures and better functional programming) but you couldn't go wrong here.

Javascript: many people rave about what a great language JS is, and there are occasional feints at doing bioinformatics in it. But while you _can_ do work in it, _should_ you? Nope.

R: A lot of ecologists & mathematical biologists use R, and it's got graphics & visualization to die for. The IDE is great for beginners as well, allowing packages to easily be installed locally. I confess to a bit of a blindspot with R (some of the syntax is a bit weird), but this could be the right choice for the right group of students.

ADD COMMENT
12
Entering edit mode
13.4 years ago
Lyco ★ 2.3k

I am sure that mine will be a minority opinion, but alas, I am a biologist myself and therefore see this question from a different angle. In my experience, biologists and related life scientists will need programming languages mainly for scripting (e.g. writing command pipelines), and for processing large amounts of textual or numerical data.

I would really recomment sticking with PERL, as I consider it most accessible for non CS people. In my opinion, the major advantage of PERL is that you can avoid object-oriented programming. (No mistake, OOP is a very powerful concept for professionals, but I don't think that a biologist should be bothered with it) PERL is also very powerful for text processing and has a very complete support for regular expressions. This might not be the tool of choice for the people hanging out at BioStar, but for the average biologist things look quite different.

I have seen R being recommended here. I definitively recommend teaching R to biology students, but I never quite got around the concept of R as a 'general programming language'. I would rather recommend to teach students how to run R procedures from another programming language (e.g. PERL).

ADD COMMENT
1
Entering edit mode

+1 for including perl (it is a valid option, even though it might introduce bad habits), -1 for excluding R as a general programming language ;) of course it is, just the fact that you didn't get 'around the concept' doesn't make it less usable, and yes, it is a full blown programming language (if you still don't believe that you might have to read up on the theoretical background of programming languages), it's just that it is more appropriate or easier to get to a solution for certain types of problems, but that is true for any language anyway.

ADD REPLY
1
Entering edit mode

Michael: of course R is Turing complete, but using it for general-purpose programming is just awkward.

ADD REPLY
0
Entering edit mode

Agreed. Never thought R was a 'general programming language', for me, it's just a tool for statistical analyses.

ADD REPLY
0
Entering edit mode

+1. R as a general-purpose language is horrible. Don't agree that Perl is a good choice though for beginners (too many implicit commands).

ADD REPLY
0
Entering edit mode

I agree with Michael S. R is very powerful for statistics and plotting. I encourage biologists to learn it. As a programming language, R is bad but still acceptable. However, when we come to the implementation, the official one is easily the least inefficient. Use R where it has strength and never take R as a serious general-purpose programming language.

ADD REPLY
7
Entering edit mode
13.4 years ago
lh3 33k

From all I have seen so far, biologists mostly need programming for large-scale text processing and for doing simple statistics on large data sets. I think a combination of Python and R is the best for them. If you have to choose one language, then Python and you can teach your students how to use the existing modules (numpy?) to do statistics. The problem with R is it is frequently awkward for text processing and for handling huge data sets over 10GB for example, while python does not have the problem and you can still use Python to do most of the basic things R can do.

PS: Personally I know little about Python and think R, as a programming language, gets implemented very badly (actually the worst). I like Javascript and Lua more these days, but for biologists, Python+R should suit them much better. MatLab is better than R as a programming language IMHO, but it is not free and perhaps lacks the rich packages in R. Perl is still a decent choice even if today. Some advanced modules only exist in Perl (though the same may be true for Python; I do not know).

ADD COMMENT
6
Entering edit mode
13.4 years ago

Ruby for the first time and for all time. Pickup other languages as you need them.

ADD COMMENT
0
Entering edit mode

I'm on the Ruby train as well. Not a lot of love so far from the other answers, but 1) its super quick to pick up the basics (created to make programmers happy - and it does!) 2) powerful text manipulation - it does what perl does, but with less gotchas, memorization, and ugliness (IMHO) 3) Documentation is plentiful 4) Can move on to using Rails if wanting to build database driven websites 5) tons of gems and options for integrating with 3rd party tools (an important skill to learn).

ADD REPLY
0
Entering edit mode

I started on ruby a few years ago, and loved it. I'd switched to perl, java, and R for a job. Recently I came back to ruby and realized how many wonderful things there were, that are totally missing from other scripting languages like python and perl.

ADD REPLY
0
Entering edit mode

Thanks for giving votesup to Ruby. I also use R, Bash, GNU programs, Python, and SQL extensively and consider myself an expert at those. In addition I have multiple have years of experience in each of: JS, PHP, C, C++, VB, C#. And some experience in: Java, Perl, and VHDL. Among all of those Ruby, R, GNUs, and Bash really stand out. Bash - pipelining/streaming as the OS. GNU tools do wonders in numerous text processing niches. R - reproducible research and condensed scientific intelligence in every line. But Ruby is the most concise and beautiful in general - I use it to connect the components.

ADD REPLY
0
Entering edit mode

Thanks for giving votesup to Ruby. I also use R, Bash, GNU programs, Python, and SQL extensively and consider myself an expert at those. In addition I have multiple years of experience in each of: JS, PHP, C, C++, VB, C#. And some experience in: Java, Perl, and VHDL. Among all of those Ruby, R, GNUs, and Bash really stand out. Bash - pipelining/streaming as the OS. GNU tools do wonders in numerous text processing niches. R - reproducible research and condensed scientific intelligence in every line. But Ruby is the most concise and beautiful in general - I use it to connect the components.

ADD REPLY
2
Entering edit mode
13.4 years ago
Benm ▴ 710

I also think PERL, Python, R are the most useful language for bioinformatics, they are easy to study, and they have powerful modules and packages such as: CPAN, CRAN, BioPerl, BioPython(numpy, scipy), so that you can flexibly use them to deal with your tons of biological data.

For Learning bioinformatics and data analysis or programming, I recommend these few books for beginners:

  • Python Scripting for Computational Science
  • Bioinformatics Programming Using Python
  • Bioperl Course
  • Python course in Bioinformatics
  • Python for Bioinformatics
  • Bioinformatics Biocomputing and Perl
  • GENOMIC PERL-From Bioinformatics Basics to Working Code
  • Mastering Perl For Bioinformatics
  • Applied Statistics for Bioinformatics using R
  • Statistics Using R with Biological Examples
ADD COMMENT
1
Entering edit mode

if possible, a delimiter for your book titles would be helpful.

ADD REPLY
0
Entering edit mode

made it a list :)

ADD REPLY
0
Entering edit mode

I can share them with dropbox, do you have dropbox account?

ADD REPLY
0
Entering edit mode

thank you Michael

ADD REPLY
1
Entering edit mode
13.4 years ago
  • A simple sql language ? storing +querying data using sqlite or extracting data from the ucsc mysql server ?
  • Don't require any tool but a browser: javascript , xslt
  • using a document oriented database (couchdb , neo4j ...)
  • simple unix command lines
  • ...
ADD COMMENT
1
Entering edit mode

You seem to be suggesting that they teach sql AND javascript. Plus, depending on the context, teaching bioinformatics requires much more than just databases.

ADD REPLY
0
Entering edit mode

No, what I wanted to say is that those solutions are cheap and simple to teach.

ADD REPLY
1
Entering edit mode
13.4 years ago
Boboppie ▴ 550

Python undoubtedly is an ideal language for teaching basic programming. It's very elegant designed and easy reading. I personally think Perl is still The language for biological science due to large collection of libs/modules (Python is growing rapidly in such aspect). But Perl is also evolving, Perl 6 might bring us some surprises. But for now I'd recommend Python if only one language needs to be chosen.

ADD COMMENT
1
Entering edit mode
13.4 years ago
Dave Lunt ★ 2.0k

It is very easy to see this from the point of view of bioinformaticians but as Lyco says here "[Perl] This might not be the tool of choice for the people hanging out at BioStar, but for the average biologist things look quite different" -this was an excellent point.

I see the benefits of both Perl and Python but (a) Perl books are MUCH better for the biologist audience (b) most of the advantages of Python just don't exist for biologist learning to write a simple script (c) most biologists Google their problem to find code snippets, and there are more solutions in Perl (at least for the problems I Google).

Now if you were training graduate students who needed to build programming skills you might argue this another way but Perl is a great choice here.

ADD COMMENT
1
Entering edit mode
13.4 years ago
brentp 24k

There is no "best" language. But, since it's not been mentioned, I'd add that awk is a very good choice.

One can become very efficient in awk without writing much code as there are implicit loops over each line of input. This makes it very simple, even for a beginning programmer, to do useful stuff. In addition, it's tied closely to the shell--which is another language they'll eventually want to learn--so things like reading from stdin and writing to stdout will become more familiar.

Plus, many of the skills/conventions one learns in awk will translate (so to speak) well to other languages.

ADD COMMENT
0
Entering edit mode
13.4 years ago
Biogeek ▴ 170

Yes, functional languages - in particular those that they can start doing right away, seem to be the most successful. JavaScript is the most obvious option as most life scientists/biologists have come across it even if just for playing with CSS/HTML.

ADD COMMENT
0
Entering edit mode
13.4 years ago
Will 4.6k

I teach the exact course your describing here at Drexel. I would suggest either Matlab (if your school already has the liscences) or python. Both languages abstract away the nitty gritty of data structures which really helps to speed up the teaching of computational biology and not computer science.

ADD COMMENT
0
Entering edit mode
13.4 years ago
Pasta ★ 1.3k

No one mentioned PHP here. Of course, this language is perfect for web development and can easily interact with databases. But what people forget is that PHP is also a scripting language that you can launch from a console. When I was a biochemistry student, it was the first programming language I learnt and I found it pretty easy to learn, especially compared to Perl...

It is a really easy language to write programs with, you can also do regex and use the BioPHP libraries.

If you need a good alternative and you are not afraid to swim against mainstream, go PHP !

ADD COMMENT
0
Entering edit mode
13.4 years ago
Aurobhima ▴ 100

I can only speak from personal experience.. I entered the world of bioinformatics with limited programming experience. I have worked exclusively with Python for the last two and a half years and found it to be one of the most enjoyable and easy to use languages I have every encountered (others being, C, C++, Java and Pascal).. It also comes with a great community to get help from when you are in trouble..

Python was designed to teach people to program and encourages good programming habits. BioPython also offers a lot of useful Bioinformatics tools, apparently not as extensive as BioPerl, but still very useful.

ADD COMMENT
0
Entering edit mode
13.4 years ago
Tiffani ▴ 150

I would say perl and ruby are also your best bet.

ADD COMMENT

Login before adding your answer.

Traffic: 1939 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6