Forum:A Farewell To Bioinformatics
12
29
Entering edit mode
11.8 years ago

Original source: http://madhadron.com/?p=263

A farewell to bioinformatics

Published: March 26, 2012

Category: biology:nontechnical

I’m leaving bioinformatics to go work at a software company with more technically ept people and for a lot more money. This seems like an opportune time to set forth my accumulated wisdom and thoughts on bioinformatics.

My attitude towards the subject after all my work in it can probably be best summarized thus: “Fuck you, bioinformatics. Eat shit and die.”

Bioinformatics is an attempt to make molecular biology relevant to reality. All the molecular biologists, devoid of skills beyond those of a laboratory technician, cried out for the mathematicians and programmers to magically extract science from their mountain of shitty results.

And so the programmers descended and built giant databases where huge numbers of shitty results could be searched quickly. They wrote algorithms to organize shitty results into trees and make pretty graphs of them, and the molecular biologists carefully avoided telling the programmers the actual quality of the results. When it became obvious to everyone involved that a class of results was worthless, such as microarray data, there was a rush of handwaving about “not really quantitative, but we can draw qualitative conclusions” followed by a hasty switch to a new technique that had not yet been proved worthless.

And the databases grew, and everyone annotated their data by searching the databases, then submitted in turn. No one seems to have pointed out that this makes your database a reflection of your database, not a reflection of reality. Pull out an annotation in GenBank today and it’s not very long odds that it’s completely wrong.

Compare this with the most important result obtained by sequencing to date: Woese et al’s discovery of the archaea. (Did you think I was going to say the human genome? Fuck off. That was a monument to the glory of that god-bobbering asshole Francis Collins, not a science project.) They didn’t sequence whole genomes, or even whole genes. They sequenced a small region of the 16S rRNA, and it was chosen after pilot experiments and careful thought. The conclusions didn’t require giant computers, and they didn’t require precise counting of the number of templates. They knew the limitations of their tools.

Then came clinical identification, done in combination with other assays, where a judicious bit of sequencing could resolve many ambiguities. Similarly, small scale sequencing has been an incredible boon to epidemiology. Indeed, its primary scientific use is in ecology. But how many molecular biologists do you know who know anything about ecology? I can count the ones I know on one hand.

And sequencing outside of ecology? Irene Pepperberg’s work with Alex the parrot dwarfs the scientific contributions of all other sequencing to date put together.

This all seems an inauspicious beginning for a field. Anything so worthless should quickly shrivel up and die, right? Well, intentionally or not, bioinformatics found a way to survive: obfuscation. By making the tools unusable, by inventing file format after file format, by seeking out the most brittle techniques and the slowest languages, by not publishing their algorithms and making their results impossible to replicate, the field managed to reduce its productivity by at least 90%, probably closer to 99%. Thus the thread of failures can be stretched out from years to decades, hidden by the cloak of incompetence.

And the rhetoric! The call for computational capacity, most of which is wasted! There are only two computationally difficult problems in bioinformatics, sequence alignment and phylogenetic tree construction. Most people would spend a few minutes thinking about what was really important before feeding data to an NP complete algorithm. I ran a full set of alignments last night using the exact algorithms, not heuristic approximations, in a virtual machine on my underpowered laptop yesterday afternoon, so we’re not talking about truly hard problems. But no, the software is written to be inefficient, to use memory poorly, and the cry goes up for bigger, faster machines! When the machines are procured, even larger hunks of data are indiscriminately shoved through black box implementations of algorithms in hopes that meaning will emerge on the far side. It never does, but maybe with a bigger machine…

Fortunately for you, no one takes me seriously. The funding of molecular biology and bioinformatics is safe, protected by a wall of inbreeding, pointless jargon, and lies. So you all can rot in your computational shit heap. I’m gone.

Please send questions and comments to Fred Ross (keep the 'madhadron.com:' at the start of your subject).

bioinformatics • 26k views
ADD COMMENT
2
Entering edit mode

Disclaimer: This is not my rant, but I thought there are some interesting points.

ADD REPLY
0
Entering edit mode

Sorry that I addressed it as 'your oppinions', I thought you referenced your own blog. It is definitely a polemic speech, and even a relatively good one. As with any good polemic rant, it contains some truth and is good in concealing that it in its overly generalized tone totally misses the point. See my response as an open response to the anonymous author.

ADD REPLY
1
Entering edit mode

See also: "Hello Fred, It’s been almost two years since “A farewell to bioinformatics”. I’m sure there are people who want to know how well it worked out for you. What parts of bioinformatics do you miss? " http://www.homolog.us/blogs/blog/2014/02/17/horror-awaits-quit-bioinformatics-academia/

ADD REPLY
0
Entering edit mode

In addition to all comments, I recommend paragraph "The problems underlying the programs for identifying TEs and repeats" in article "Identifying repeats and transposable elements in sequenced genomes: how to find your way through the dense forest of programs" It expresses my own thoughts, especially that more programs should be open-source and easy to enter for new contributors instead of creating new tools. How to achieve this remains a rhetorical question:) Any comments are welcomed.

ADD REPLY
26
Entering edit mode
11.8 years ago
Michael 55k

Yes, it can be frustrating to work in an interdisciplinary field, and yes there can be a lot of misunderstanding, lack of communication. You sound a bit like a sour puss though. I can only assume that your fascination for discerning the biology by means of informatics and maybe your enthusiasm weren't strong enough to stand this test. But that is ok, it is ok to move on.

What is not ok is to swear and to talk in a very disgusting way about your recent field of occupation, because that devaluates primarily your work, competences and contributions to the field. And then it affects the work of your colleagues and (former) fellow bioinformaticians who are often trying hard to work closely with lab biologists, design experiments in close collaboration, and discover new problems and computational solutions or help to improve functional predictions on all sorts of biomolecules.

By the way, in science it is ok to try things, it is called testing hypotheses. Not all hypothesis work, not all kinds of technologies (e.g. microarrays, RNA-seq) yield extreme progress in a very short time frame. (Same holds somewhat true for the economy: not all products sell, some companies go bankrupt, imagine, there might even be a fiscal crisis in some countries). Making exaggerated predictions to the general public about the value of any technology (e.g. finding the 'ultimate cure for cancer' using XYZ), I agree, has been observed, and can be criticized. Raising exaggerated expectations might have been motivated by the aim of raising more public and private funding, while funding bodies might have shifted towards an increasingly utilitarian approach to science.
And again this might apply to all academia. Yes, it can be criticized, but this rant contributes nothing constructive to this discussion.

The fact that you only take into account very few classical algorithms on sequences, ignoring all kinds of approaches for prediction, e.g. gene prediction, predictions of molecular functions via machine learning techniques, shows me that your general view on computational biology is possibly way to narrow. In bioinformatics, not only those NP hard problems are worthy of computational power. What you are writing about obfuscation or reproducibility, however, could be said about almost any computational field, and a lot of people are concerned about this and are trying to address reproducibility of scientific code in many ways. It is easy to rant, but much harder to come up with a generally acceptable solution for reproducibility. And I would also like point out to you that often the incentive to perform biological experiments in a more reproducible, standardized and statistically valid manner has come from computational biologists, or (bio)statisticians.

I also do not want to discourage you any further, but consider the following: What makes you believe that working in the software industry is going to be any better for you? Different problems yes, but better except for the payment? An honest question, please share your views after working in your new job for say 1 year; I guess a lot of people in Bioinformatics (including myself) have thought of leaving once or twice.

tl;dr: Yes it can be difficult to work with non-computer scientists (in particular for computer scientists).

ADD COMMENT
1
Entering edit mode

Btw, these are solely my own thoughts, no copy-pasting done.

ADD REPLY
1
Entering edit mode

The original author wrote this in May last year so might be worth emailing for an updated perspective

ADD REPLY
2
Entering edit mode

I wrote Fred Ross (original author) and told him about the discussion here.

ADD REPLY
15
Entering edit mode
11.8 years ago

I think that the author's problems stem mainly from the fact that he worked on the wrong problem in what might have been the wrong environment. Going into the "software industry" is in no way cure for the either of these. Everyone knows well their own field's shortcomings and limitations - but that does not mean that some other domain/industry must be better. Software industry in particular is ripe with horror stories of various kinds. The main lesson to learn is not that this field or that one is bad, rather that it is important to pick a line of work that makes you happy/satisifed - and if that does not seem to come about switch to something that does.

ADD COMMENT
2
Entering edit mode

If I am correct, martinahansen is the author of biopieces, so he has been working on very useful, user-friendly and popular piece of software.

ADD REPLY
3
Entering edit mode

Looks like this was credited to "Fred Ross", not sure if this is a pseudonym. The post is from somewhere else and martinahansen was just re-posting it here.

ADD REPLY
8
Entering edit mode
11.8 years ago
lh3 33k

Someone has submitted Fred's rant to the programming sub-reddit, where you can find the link to Fred's response, titled "Public comments considered harmful". While the response is flawed in general, it correctly points out a trend which is further confirmed by the discussion in the programming sub-reddit: his rants got more support in the community of common programmers - about half of outsiders second Fred's opinions.

A summary of related discussions:

ADD COMMENT
3
Entering edit mode

one wonders why the community of programmers agrees with Fred's sentiment ... after all Computer Science as field is just as prone to producing useless and terribly written software - everything that Fred laments in bioinformatics does also happen in computer science. Possibly at much higher rate. Moreover the majority is done via conferences and for profit gatherings that are closed from public scrutiny, and are not stored, archived or searchable in any standardized fashion. Recall the SCIgen: computer science paper generator and other similar stories. Do we conclude that Computer Science is useless as a whole?

ADD REPLY
7
Entering edit mode
11.8 years ago
Kate ▴ 370

I have a related question for you all.

I was discouraged that this article being on the front page of Hacker News would dissuade many able computer scientists from considering bioinformatics. In fact, some of my pure computer science friends had taken a bioinformatics course and agreed completely with this article. I said that things seem to be getting better over time, but they said they keep hearing about vague "improvement" with no concrete examples. As just an undergrad (in CS and Molecular Biology), I couldn't come up with much to say, so I turn to you. Help me persuade the computer scientists!

Can you give any specific examples of how bioinformatics has improved over time? From specific tools and file formats to general practices?

Thanks!

ADD COMMENT
6
Entering edit mode

Fuzzy thinking is not encouraged in CS. Whereas fuzzy data is all you will get in biology. I really think that is the fundamental problem that most programmers have a hard time reconciling their training with. They have to realize that just because the data is fuzzy doesn't mean the thinking has to be.

Sure, there might be a higher tendency to not think explicitly in biology by using hand-wavy generalizations, but that is more a fault with the academic culture than the science.

ADD REPLY
0
Entering edit mode

A good point. Fuzzy measurements in biology come from the variability of biological processes. In my opinion, variability is an inherent component to all biological systems and evolution to cope with an ever changing and fluctuating environment, which again has provided robustness and survivability; life's recipe for success. Otherwise, with the first 'deep impact', all life might have been extinct. That is also why strict optimization methods do not apply to biology. It might be hard to accept for CS people, who are used to a deterministic and strictly logical way of thinking, but in fact deterministic processes are the exception rather than the rule.

ADD REPLY
2
Entering edit mode

Perhaps you may consider to open a new question. I think what you are asking is a valid biostar question I don't know the answer of.

ADD REPLY
2
Entering edit mode

Thanks for the encouragement, @ih3 and others. I have posted a new forum question on this topic How Has Bioinformatics Improved Over Time?.

ADD REPLY
1
Entering edit mode

I think we could easily deconstruct each single 'argument' from the rant, I chose not address each misguided point made in the post because none of them is backed by fact and the author didn't show the intention to provide evidence either. I agree with you that it would be very sad if such posts be read as an statement against bioinformatics in general and not just as a subjective view of a single angry and discouraged person.

ADD REPLY
6
Entering edit mode
11.8 years ago

A lot of hyperbolic statements. But I think it also brings up many unsaid sentiments of the field. Academia is losing a lot of talent to industry.

ADD COMMENT
3
Entering edit mode

+1, I agree, it comes off way too bitter to be taken seriously. There are lots of sentiments expressed here that are frustrating about bioinformatics, but I don't think these are necessarily exclusive to bioinformatics (I just say that because I purposely re-read Thomas Kuhn's Structure of Scientific Revolutions last year and was surprised how - this time, didn't see it a decade ago - I read the context of bioinformatics into the history of science). I do worry about the drain of talent away from academia, but I am worried about the loss of research funding even more, which I think is a large reason for the drain.

ADD REPLY
5
Entering edit mode
11.8 years ago

There are a couple of good points in this grenade along with some rather bizarre assertions, but overall nothing that hasn't been already said with more precision elsewhere.

I don't want to trivialize his gripes but I suspect this guy just needed to blow off some steam. I bet he'll be back for more punishment in a couple years. Software companies have their own share of disgruntled idealists.

The fact that he was working close to the Bioconductor people makes me wonder why he didn't just join them.

more discussion at hackernews: http://news.ycombinator.com/item?id=5123022

ADD COMMENT
2
Entering edit mode

Kate's question made me reread the the HackerNews thread more carefully. It is actually interesting to read. My summary of that thread is: industrial bioinformaticians and professional programmers in other fields think academia bioinformaticians do not know how to write efficient and professional code, while academia bioinformaticians only want to get the research done but do not care about the coding quality. Note that HackerNews, as its name implies, is a place for programmers. Their discussions focus on the programming aspect of the original post.

ADD REPLY
1
Entering edit mode
ADD REPLY
5
Entering edit mode
11.8 years ago
brian.calves ▴ 70

Everyone, please look up "ad hominem" and resolve not to make such arguments anymore.

Fred was obviously venting anger and his words will fall on deaf ears, which is why he departed the field in the first place.

It saddens me to say this, but it is difficult to imagine why an aspiring computer programmer would pursue bioinformatics.

I applaud Kate for pursuing a double-major in computer science and molecular biology. It is essential to embrace both disciplines without reservation.

ADD COMMENT
4
Entering edit mode

While I have upvoted you, which means I second your opinion in general, I need to point out that an aspiring programmer still has reasons to work on bioinformatics. I know a few professional programmers are simply bored with tedious and mechanical daily works. They want to try something new and changing, something without a definite answer. They feel solving practical problems they are interested in makes their life happier, even if they are paid relatively less. These programmers indeed find a nice niche in bioinformatics, but Fred unfortunately not.

The point I agree with you is that I also think bioinformatic community should be more open to those whose primary interest is not biology and those who think differently from the majority (for example, my primary interest is math, algorithms and programming, but this does not stop me making contribution). At the same time, I think more people should stand at Fred's angle to understand his difficulty and frustration. I am quite sympathetic towards him.

ADD REPLY
2
Entering edit mode

I've found my interest over the course of my phd to drift towards math, algorithms and programming. I agree that there perhaps is an underlying sentiment in the community that it's not really "biology" and thus not worthwhile. And that sentiment is what probably drives people to industry.

ADD REPLY
4
Entering edit mode
11.8 years ago
skymningen ▴ 330

There is a reason why people choose to do different things. Imagine if they weren't. I chose bioinformatics and I haven't regretted it (at least not for a longer time than the occasional 5 minutes you have when everything seems to go wrong). For Fred Ross, obviously one decision proved wrong. I am sorry, Fred, if this was not what you expected it to be. But there are downsides to everything.

If I am asked to work with shitty data, I stand up and tell people that their data will not provide useful results. Most of them are quite happy if you try to help them improve their data. If they aren't, well, they should have asked somebody else if they don't want to hear the (friendly told) truth.

I started in this field in 2006. I followed the field since about 2003. I feel that there is a lot going on. I am naturally curious and love change and working with new things. Bioinformatics gives me the possibility to do so. I like the way it builds links between logic and life. If nobody stops me fast enough, I can tell how wonderful this is for hours on end. And still there are days where I look at the screen and nothing I do seems to make sense. I guess those times you have in every profession.

Fred Ross made a wrong decision, which is not bad in itself. Even ranting about it is not bad in itself. (It surely helps.) Though, ranting about the field of bioinformatics instead of the realisation that it was the wrong thing to do for him is surely the wrong way.

EDIT: There is a slight difference between nothing and everything.

ADD COMMENT
0
Entering edit mode

Good points. When I started in science I used to believe statements that started with "everyone doing X" or "the entire field of Y", "the future is in Z", then I became increasingly skeptical. Today I am mostly amused when I read them. They strike me as incredibly limited, almost parochial in nature, as if one wanted to describe the entire planet just based on observations in their neighborhood.

ADD REPLY
3
Entering edit mode
11.8 years ago

Such a grumpy bioinformatician!

Every science is a chaos with some very specific rules, with a lot of errors in everything, BUT IT'S F...ING WORKING! I met a lot of errors in databases and errors in algorithms I am using, I don't even talk about data exchange and replication in cheminformatics because in 99% of the cases is terribly sucks. BUT that is the life it is.

ADD COMMENT
0
Entering edit mode

Yes, sad fact is that 90 % of programmers are not very clever and make mistakes all the time. Read Joel on software for more on this.

ADD REPLY
2
Entering edit mode

I would not necessarily qualify it that way anymore. Programming is a unique task and it is unlike most other jobs. It is very easy to push past the limits of one's understanding.

ADD REPLY
3
Entering edit mode

I don't agree that programmers are not very clever. There is definitely a problem with communication between biology/chemistry people and coders.

ADD REPLY
0
Entering edit mode

That's what I wrote in the conclusions of my review paper, curently under peer review. Here's an excerpt:

In order to see significant improvement in healthcare utilising genomic, transcriptomic, and epigenomics data, there must be greater interdisciplinary cross talk between scientists. This includes, but is not limited to, physicians, clinical geneticists, computational biologists, and policy makers. New and recent technology can help to improve treatment, but only in the context of an understanding of disease mechanisms.

ADD REPLY
2
Entering edit mode
7.1 years ago

Just seeing this for the first time. They look like the words of a person in the wrong field - that's all.

I began as a computer scientist but didn't quite like it; then I was a biologist, and neither felt too comfortable there. I ended up analysing biological data for supervisors because data-loads were increasing and I had experience in computer science. I now love both aspects of my job, i.e, developing methodologies to process data in ways that help me to better understand the biological meaning from it.

ADD COMMENT
1
Entering edit mode
11.8 years ago
Honey ▴ 200

I work with all programmers each being best in one or two languages; In addition I also work with colleagues who are exceptional in Bioconductor. However I as a biologist with very little knowledge of hard core programming try to move along to get my biological questions solved. With time I have realized that everyone is concerned about the credit- who is most important or less and so on so forth. Since I am a biologist I ask the question which will solve the problem from biology point of view and have them to work on that issue; I may always be considered as who is taking major credit. At the end of the day- I believe it is all a combined effort everyone is indispensable; that is how one has to move forward. But with my due apology pure It need to take into account that it is always challenging and different to work in academic vs pure IT industry. It is always a joint effort.

ADD COMMENT
0
Entering edit mode
10.5 years ago
BioApps ▴ 800

Take a biologist. Feed him some high-level pseudo programming languages and a huge pile of libraries and ask him to process a 20GB file. The magic result will be....... bioinformatics.

Really, nobody is using compilers today? And why is everybody so lazy to put a graphic interface on top of their command line tool? It is not that hard.

ADD COMMENT
1
Entering edit mode
  1. most bioinformatics tools are written in compiled languages, don't mistake scripting and connecting tools to the actual tools
  2. most people here prefer command line tools because those allow for automation. A typical experiment that we have to look at may contain over 20 different files. No one here wants to click and open 20 files separately.
ADD REPLY
1
Entering edit mode

This might be true for some cases. But for most cases:

  • A bioinformatician should know coding either in interpreted languages (python, perl) or compiled languages (java, c or c++) or both
  • He should also be familiar with one of the statistical programming languages (R, Matlab, Scala etc)
  • He should know of server side and client scripting (js libraries) and do web development
  • He should know databases (RDBMS mostly)
  • He should know advanced statistical methods such as ML
  • He should know cluster and/or cloud computing
  • He should be familiar with *nix OS and some times, administration.

These are the requirements now a days for most of the bioinformaticians. Biology is the least preference for a bioinformatician in most of the interviews and job requirements. If a bioinformatician is able converse with a biologist, that is plenty enough to pass as bioinformatician.It is mostly informatics and statistics now, no biology in bioinformatics. This is apparent and obvious in most of the requirements for bioinformatics posts all over the the job portals and interviews. IMHO, bioinformatics is a misnomer and misleading when there is no place for biologists in it.

ADD REPLY

Login before adding your answer.

Traffic: 2945 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6