Forum:Why bioinformaticians need to know programming languages?
4
6
Entering edit mode
7.6 years ago
nilo ▴ 90

Hi All!

I am a newbie in bioinformatics and I want to learn about it.

One question I have is do bioinformaticians really need to learn programming languages? And if so, why? From one side I see a lot of posts saying that there are lots of softwares out there that allow you to perform bioinformatics analysis and that "reinventing the wheel" is dangerous. From another side, I see many people saying that knowing programming languages are important for bioinformaticians.

I would like to know if there are already many programs and softwares that do the bioinformatics analysis, then why does a bioinformatician need to learn a programming language to develop a program or a software ? [I assume the need for learning programming is to develop a software or program]

Thank you!

Python R Programming • 6.7k views
ADD COMMENT
5
Entering edit mode

You can consider the software that is out there as your basic toolbox with hammer, screwdriver, monkey wrench, saw, whatever. You can apply those tools to your data (raw) which then will give you more data (processed). E.g., run bwa on your reads and reference turns you .fastq raw reads into a .sam file of aligned reads. Now, what do you do with those aligned reads? If you're doing something simple, you might load them into a genome viewer and maybe that is already enough to see what you're looking for. When dealing with huge amounts of data in a research environment, you will end up with a pile of processed data that you need to sift through. Do you want to do that manually? Inspect each and every single pair of your millions of aligned reads individually? Or your blast hits? Do you want to always run every tool manually? Those are some of the situations where it comes in handy to know a little bit of scripting. The idea is that you extend the existing set of tools with specialised scripts or programs in order to solve the problem you're working on. The idea is not that you have to develop new generally applicable tools unless you're interested in doing so (or it is part of your job). Since you have tagged this question with "R": R packages provide solutions for specific problems (e.g. differential gene expression), however, you still need to write a couple lines of code to use those solutions, which is also a kind of coding.

In general, you should know a scripting language (Python, Perl, Ruby being the most common), bash scripting and bash command line tools in addition to the typical bioinformatics tools (although tools can always be learned on the job, in my opinion). There is no need to learn more about languages such as C/C++, Java, etc. unless you want to look into developing efficient bioinformatics tools (but then it is not just the languages, then you also need to learn algorithms and data structures). Just for data analysis, these latter languages/concepts/skills should not be required.

ADD REPLY
1
Entering edit mode

Thank you :)

I just need two clarifications:

  1. If I only want to do bioinformatics analysis, I need to know a bit of the programming languages needed to run the softwares just to be able to use those softwares to analyse the data. Correct?

  2. Under what circumstances does one need to develop a software or an algorithm and new tools? For some jobs, I see the job description says that the successful candidate must know programming languages to develop tools and softwares?

Thank you :)

ADD REPLY
2
Entering edit mode
  1. For analysis you just need to know enough to a) run the various analysis tools efficiently (by building pipelines e.g. via bash scripting (assuming you're not using a platform like Galaxy)) and b) to "filter" and format your results into something usable.

  2. Unless you're in an algorithm-heavy lab (usually in a Computer Science or Math setting) you will probably not develop the next super-fast, super-accurate short read aligner, variant caller. But, remember, algorithms are just recipes or lists of steps to perform in order to achieve something (e.g., you sort a list by performing certain steps in a structured fashion), so even with building some simple pipeline you will be developing one (mind you, it might not be the most efficient and fast one, but it nevertheless counts as one). The circumstances for developing some new software are usually that there is no (easy-to-use, easily accessible) solution available for a certain problem. For instance, I am currently developing a pipeline for assessing a specific gene family in plants, using specific exome captures and PacBio sequencing. I am tying various tools together with a Python script and develop methods in Python that do not exist for my specific problem. Ultimately, I will end up with a new software tool, even if it is just another pipeline with custom-built components under the hood. Hope that makes sense.

ADD REPLY
1
Entering edit mode

The main reason you need to learn a programming language for bioinformatics, I'd say, is that any given work flow is an exercise in connecting the dots. You feed files from program A -> B -> C and so on. However, much of the file conversion needed to go between programs is most easily accomplished if you're handy with Bio(perl|python), bash, awk, and so on.

Much of the 'heavy lifting' of an analysis workflow might be handled by a self contained program, where you don't need to interact with it very extensively, but there will always be intermediate bits that you have to get your hands dirty with.

ADD REPLY
0
Entering edit mode

Thank you :)

I just need two clarifications:

  1. If I only want to do bioinformatics analysis, I need to know a bit of the programming languages needed to run the softwares just to be able to use those softwares to analyse the data. Correct?

  2. Under what circumstances does one need to develop a software or an algorithm and new tools? For some jobs, I see the job description says that the successful candidate must know programming languages to develop tools and softwares?

Thank you :)

ADD REPLY
0
Entering edit mode

You could have added this supplementary question in original post rather than posting this 3 times.

ADD REPLY
2
Entering edit mode
7.6 years ago
LLTommy ★ 1.2k

Bioinformatics is a big field, and it is hard to define 'bioinformatics. Therefore, there are jobs where it is not necessary in every position. For example data curators would also often be considered bioinformaticians and the might not have to program themselves. Most jobs in bioinformatics are wrangling with data - so that is heavy in statistics and that is where you have to have linux/bash skills, R or python - just like jrj.healey described it above. And as a third group, bioinformatics can heavily go into software development, so beyond R and the likes but .... all kind of programming languages, web design and databases etc.

So, yes, the majority of bioinformatics obviously has to do with some sort of programming. It is no surprise though, 'informatics' is in the name!

ADD COMMENT
1
Entering edit mode

Thank you :)

I just need two clarifications:

  1. If I only want to do bioinformatics analysis, I need to know a bit of the programming languages needed to run the softwares just to be able to use those softwares to analyse the data. Correct?

  2. Under what circumstances does one need to develop a software or an algorithm and new tools? For some jobs, I see the job description says that the successful candidate must know programming languages to develop tools and softwares?

Thank you :)

ADD REPLY
4
Entering edit mode

job description says that the successful candidate must know programming languages to develop tools and softwares

Present job descriptions are 50% requirements and 50% wish-lists (perhaps not that bad, but you get the idea). If one can find an overqualified candidate for the same salary, why not? As long as you possess the "core" competencies listed for the job, you should go ahead and apply.

ADD REPLY
2
Entering edit mode

That point on wish-lists is really good advice genomax :) I never really thought of it like that before

ADD REPLY
0
Entering edit mode

Well, no. 1 If you say: "only want to do bioinformatics analysis I need to know a bit of programming" ... don't underestimate it. It deepens on what you want to accomplish but these analysis can become very intense and can consist of more that just 'a bit' programming. Also, as mentioned, do not underestimate the mathematical/statistical part of it! R after all is for statistics!

2 Ahm...... You say some job descriptions say you must program and develop software... I guess that is one of the circumstance where one has to develop software and new tools? ...so, I don't really understand the question I guess.

ADD REPLY
2
Entering edit mode
7.6 years ago
John 13k

How do you know what a tool does unless you can read it's code? :)

ADD COMMENT
1
Entering edit mode

While I agree with this as an optimal case, it is a bit far-fetched for the common bio-scientist, no? :P I mean, we're also not digging through the software of the qPCR machine or the Nanodrop or do we?

ADD REPLY
1
Entering edit mode

we're also not digging through the software of the qPCR machine or the Nanodrop or do we?

But you do know how to use it. Similarly, if you want to use any tool you should know little programming.

ADD REPLY
1
Entering edit mode

Nanodrop in R

Stolen from (

)

ADD REPLY
0
Entering edit mode

Yes it is far-fetched. However so is curing cancer. Yet we attempt to solve such problem regardless :)

Bioinformatics is a rapidly maturing field. The software worthy of immense praise 20 years ago would be by today's standards unacceptable. People are realising that bad software often does more harm than good, and it won't be long until it becomes main-stream knowledge that bad in silico analysis does more harm than good. I think it will be uncommon to find a bioinfomatician who cannot program in Python or similar in the next 5-10 years. Not because we want people writing tools, but because we want people understanding them.

I understand your point about the Nanodrop, but it's a bit of a false equivalent, as the software in a nanodrop or qPCR machine is probably the least important part of the machine. The fundamentals of these machines is not software but the biochemical process that takes place inside them, and yeah, if you don't know exactly how the machine is working at a fundamental level, you might not get any results. The problem with bioinformatic software is even if you do not understand the fundamentals - or even anything at all - you might get something back that looks perfectly valid. Hell, it might look so valid you should publish this amazing new result.

The next couple of years in the bioinformatic space is going to be really interesting. You're going to have experiments suggesting false results, and there's going to be a lot of finger-pointing and discussion of where the responsibility for violating unspoken assumptions lies. I threw up a poll on Reddit the other day, and the results were pretty 50:50.

ADD REPLY
0
Entering edit mode

I understand your point about the Nanodrop, but it's a bit of a false equivalent, as the software in a nanodrop or qPCR machine is probably the least important part of the machine.

Yea, you're right. Didn't think that through.

ADD REPLY
2
Entering edit mode
7.6 years ago
GenoMax 148k

Having an idea of how to cook gives you a chance to spot/fix a recipe that is going wrong. You don't need to be a great cook for that to work.

As others have said, you can get by without serious programming chops but to fiddle with existing code and/or optimally use programs (or even operating systems) requires adequate understanding of their inner working.

ADD COMMENT
1
Entering edit mode
7.6 years ago
h.mon 35k

Two simplistic answers:

1) bash is both a shell to interact with the operating system and a programming language. Using bash and concocting a script into a crude pipeline is already programming. Knowing general programming concepts will help you write less crude, more reliable scripts.

2) you will probably have to massage data from one format to another, and sometimes the only option left is for you to write the code (there is a wonderful but sad quote about bioinformatics being 95% converting data, but I can't remember where I read it).

Variations of this question pops up from time to time, see for example "Thoughts on Bioinformatics and programming"

ADD COMMENT
1
Entering edit mode

there is a wonderful but sad quote about bioinformatics being 95% converting data, but I can't remember where I read it

It could be this one:

ADD REPLY
2
Entering edit mode

That is a good one, but no, its is either a post here on BioStars, or a blog post, or something else I can't remember. I would say its 60% / 35% / 5% chances, respectively, although I got these numbers from a nanoDrop.dsDNA.conc()-like function.

edit: found it, at least one version of it, here at "Very Bad Things". I don't know if this post was the first time I read this quote. My memory inserted percentages where there were none.

edit 2: see how knowing to code a bit is important to help you socialize with fellow bioinformaticians? What if someone tells a code-related joke and you are the only one not laughing? Thinking about it, that has to be number one reason to learn some coding.

ADD REPLY
1
Entering edit mode

edit 2: see how knowing to code a bit is important to help you socialize with fellow bioinformaticians? What if someone tells a code-related joke and you are the only one not laughing? Thinking about it, that has to be number one reason to learn some coding.

Seriously? You think the most important reason to learn programming is to be able to laugh? I hope that was sarcastic!

ADD REPLY
1
Entering edit mode

No sarcasm, it was just a light-hearted joke.

ADD REPLY
2
Entering edit mode

Now i think LLTommy is making a joke about not being able to get a joke :-D v. meta!

What's that famous bioinformatic joke.... there's only two problems in bioinformatics: sequence assembly, protein folding, and off-by-1 errors.

ADD REPLY
1
Entering edit mode

I don't get it, there are three problems in the list!

ADD REPLY
1
Entering edit mode

But only two are worth solving :)

ADD REPLY

Login before adding your answer.

Traffic: 1959 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6