Question

Which Application Is Truly Missing In Bioinformatics?

9

Entering edit mode

14.7 years ago

Jarretinha 3.4k

It's a simple & straight questions. Just think about an app that when you found it, you first thought would be - "OMG!!! That's it" - or smth like - "I wish I could have found/written/idealized it before". Don't need to be a bioinformatical swiss knife or a McGuyver paper clip. Just smth that would make your life much happier/easier.

My example is quite simple. I really wish that some sort of Monte Carlo Simulator of Generic Urn Models (population genetics rlz!) just appear in the net, with a nice, clean and well documented API (written in C) and bindings for my favorite scripting languages. That's what I really miss, right now. What's your story?

-- Edit --

I'm considering to start a bounty for this question. Maybe the first answer to get 10 up votes. Where are people's whishes? Don't be afraid, bionformaticians like me are lousy critics.

subjective general • 11k views

ADD COMMENT • link updated 12 months ago by Ram 44k • written 14.7 years ago by Jarretinha 3.4k

0

Entering edit mode

Yeah ! I'm a math nerd . . .

ADD REPLY • link 14.7 years ago by Jarretinha 3.4k

score 12 · Answer 1 · 2010-03-26

12

Entering edit mode

14.7 years ago

Neilfws 49k

I think it would be incorrect to imagine that there is a single, "killer app" for bioinformatics. I don't see bioinformatics as a field, discipline or topic. For me it is about: (1) inputs - many, diverse types of biological data, (2) processes - the code that we write to handle the data and (3) outputs - what the code produces and the subsequent biological interpretation.

That said, I'm sure there are tools we would all like to see when we handle whatever data type comes our way. I'd like to see:

A RESTful API for every public, online database
Better web applications for data integration - so that when I search for, e.g. a gene, everything that's known about that gene and its products is presented to me in a way that enables effective data exploration

There are those who say that the "linked data web" is the answer to (2), which remains to be seen...

ADD COMMENT • link 14.7 years ago by Neilfws 49k

1

Entering edit mode

The Linked Data web is showing really powerful, just by using simple standards... there is the obvious learning curve, and many people don't always use the right implementations for the right problem, but the standards are starting to rule...

ADD REPLY • link 14.2 years ago by Egon Willighagen 5.4k

0

Entering edit mode

Don't need to be a generic killer app. Just the one which would make your life more colorful !!! Maybe we are all dreaming about the same thing . . .

ADD REPLY • link 14.7 years ago by Jarretinha 3.4k

Ram · Answer 2 · 2010-03-25

I wish ~~R had an entire web application framework like Rails and~~ (Shiny covers this) that there was an easier way to go between genome browsers and analysis and back.

There are also not enough tools to properly organize individual genetic variation in humans much less poorly characterized species. I guess we'll have to see what 1000 genomes comes up with.

I would also like a proper "finishing tool" to visualize and reconcile Velvet assemblies.

Ram · Answer 3 · 2010-03-25

7

Entering edit mode

14.7 years ago

Phis ★ 1.1k

/irony on

Something like this is missing for bioinformatics.

/irony off

Seriously, I think whatever it is, a clean and well documented API - as you say - is something it should provide.

ADD COMMENT • link updated 5.2 years ago by Ram 44k • written 14.7 years ago by Phis ★ 1.1k

1

Entering edit mode

rotfl.... +100!

ADD REPLY • link 14.7 years ago by Giovanni M Dall'Olio 28k

score 7 · Answer 4 · 2011-01-24

7

Entering edit mode

13.8 years ago

In The Hope Of A Better World ▴ 70

Well, I'm not looking for a killer app but for the killer file format: I would like to see an obligatory use of xml (or any other standard) for sequences, phylogenie, etc.

Isn't it true that most of our work is related to formatting and struggling with strange output files? For instance, who can tell whether a Newick string contains information on probabilities, distances or time measurement? This heavily depends on the program which generated the output. Therefore, I demand a common file structure with clearly defined fields for any relevant information.

ADD COMMENT • link 13.8 years ago by In The Hope Of A Better World ▴ 70

4

Entering edit mode

XML is probably the best format for phylogenetic trees. Nonetheless, XML is notoriously bad for huge data sets, hard to be indexed, and against nearly all useful Unix utilities.

ADD REPLY • link 13.8 years ago by lh3 33k

1

Entering edit mode

Well, I do not think one can deny XML is Unix unfriendly. As for huge data sets, look at tracedb, NGS reads and alignments. No XML at all. Even for relatively smaller data sets such as variants and genotyping results, almost no XML. XML is good for many things, but do not abuse it.

ADD REPLY • link 13.8 years ago by lh3 33k

1

Entering edit mode

That's the truth right in the face. We do spend a lot of time, curating/converting/trimming data from various fonts. Some sort of standard/spec ou guidelines is much needed. I even don't know why databases like NCBI still offer such a variety of different file formats. I think that this idea deserves a topic on its own.

ADD REPLY • link 13.8 years ago by Jarretinha 3.4k

1

Entering edit mode

Readability, complexity and efficiency are conflict with each other. Different people have different preferences, too. I do not see a single format can be suitable for all purposes, just as no single programming language can rule the world. All we should do is to choose the right format for the right thing.

ADD REPLY • link 13.8 years ago by lh3 33k

1

Entering edit mode

XML has lots of problems for bioinformatics. It adds huge bloat (in a field already drowning in data) and it's awkward to represent bio symbols (eg "<" ">" for secondary structure). JSON would be better, but really, IMNSHO, this kind of complaint is as old as the hills, and tends to reflect a lack of understanding of exactly how and why formats in bioinformatics are so disparate, and the difficulties of unifying them (almost insurmountable except in the context of some massive new project which could drive a whole lot of other little changes, such as file format migrations, in its wake)

ADD REPLY • link 13.8 years ago by Ihh ▴ 20

0

Entering edit mode

lh3 that's just your humble opinion isn't it ;) what you are saying is debatable at best.

ADD REPLY • link 13.8 years ago by Michael 55k

0

Entering edit mode

Even Word and Excel files are stored in as XML-like way (docx). For me, this format is just a methode to provide meaningful additional data to the original dataset. Something like exif for images.

ADD REPLY • link 13.8 years ago by In The Hope Of A Better World ▴ 70

0

Entering edit mode

BioXSD looks great! Hopefully it will make its way! My support is for sure.

ADD REPLY • link 13.8 years ago by In The Hope Of A Better World ▴ 70

0

Entering edit mode

lhl3: "Well, I do not think one can deny XML is Unix unfriendly", well I can, because that term Unix unfriendly is not defined and whether or not something is 'friendly' towards an OS is totally irrelevant in many cases, learn to make an informed decision of tools without advocating UNIX, XML or anything as an ideology, everything has it's pros and cons. UNIX was there long before you and me and XML were there. There are a lot of useful tools that run under various OS including UNIX, linux, BSD, search your LINUX package manager for XML.

ADD REPLY • link 13.8 years ago by Michael 55k

0

Entering edit mode

in mass spec proteomics xml seems to be prevalent not just for metadata but for core data too (spectra ~ peak lists, peptide/protein identifications), PRIDE xml, mzML, mzIdentML...and I am saying this neutrally

ADD REPLY • link 13.7 years ago by Attila Csordas ▴ 520

0

Entering edit mode

In mass spec, there is not that much data (compared to NGS)

ADD REPLY • link 13.7 years ago by Aaron Statham ★ 1.1k

Ram · Answer 5 · 2010-03-26

5

Entering edit mode

14.7 years ago

Darked89 4.7k

Web based genome browser (WGB) which does not require 110 Perl modules (one of which simply will not install). WGB with can handle multiple eukariotic genomes, next gen sequence data and can render things more pleasant for the eye than >>>>>---- or sharp square boxes.

Maybe with this: http://genometools.org/annotationsketch.html or Google Maps.

ADD COMMENT • link updated 12 months ago by Ram 44k • written 14.7 years ago by Darked89 4.7k

1

Entering edit mode

fyi, you can (and i have) linked up genome tools annotation sketch to this: http://code.google.com/p/genome-browser/ to make a slippy-map genome-browser.

ADD REPLY • link updated 5.2 years ago by Ram 44k • written 14.7 years ago by brentp 24k

1

Entering edit mode

X:map, at http://xmap.picr.man.ac.uk/, might be something you'd like to take a look at. It's very cool.

ADD REPLY • link 14.3 years ago by Cbare ▴ 60

0

Entering edit mode

[?]X:map[?] might be something you'd like to take a look at.

ADD REPLY • link 14.3 years ago by Cbare ▴ 60

0

Entering edit mode

JBrowse relies only on BioPerl (and that requirement will probably soon be eliminated)

ADD REPLY • link 13.8 years ago by Ihh ▴ 20

Ram · Answer 6 · 2010-03-25

4

Entering edit mode

14.7 years ago

Pierre Lindenbaum 164k

I want a bio2rdf for the whole scientific corpus.

ADD COMMENT • link updated 12 months ago by Ram 44k • written 14.7 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Now that's a miracle !!! :)

ADD REPLY • link 14.7 years ago by Jarretinha 3.4k

score 3 · Answer 7 · 2011-01-24

3

Entering edit mode

13.8 years ago

Tg ▴ 320

Something like CPAN, but for bioinformatics program instead of perl module.

I'm just to lazy to looks for a link.

ADD COMMENT • link 13.8 years ago by Tg ▴ 320

Ram · Answer 8 · 2010-04-15

2

Entering edit mode

14.6 years ago

Madelaine Gogol 5.3k

I think there's room for a killer app in genome browsers, definitely. Each browser has strengths and weaknesses, and if someone could just wrap up all the strengths with none of the weaknesses, bam.

I think more development in the sharing/public data realm would be great for bioinformatics, though I'm not sure if it necessarily has to take the form of an application. Galaxy might be the answer there.

R is good, but there's a lot of room for improvement. Getting some nice streamlined robust documented code for short read analysis would be nice, especially if they could do away with one or two of their complicated objects. (If you think I'm wrong, please point me to your code).

bedtools kind of felt like a killer app to me, as did galaxy.

ADD COMMENT • link 14.6 years ago by Madelaine Gogol 5.3k

2

Entering edit mode

If it's data originating from a biological experiment and you're using a computer to do something to it, why can't you call it bioinformatics?

ADD REPLY • link 14.6 years ago by Madelaine Gogol 5.3k

2

Entering edit mode

@Jarretinha : I can't agree with your statement "bioinformatics is not about dealing with sequence data". Sequence analysis(small, large or bigdata in scale) is an integral part of bioinformatics, not just the tip. Basically any experimental approach that generate data related "biology" matters to bioinformatics. In that sense, sequence data is a key player. Can you tell me any other data category that contributed a significant amount of methods or approaches than sequence data ?

ADD REPLY • link 14.6 years ago by Khader Shameer 18k

0

Entering edit mode

As noted before, bioinformatics is not about dealing with sequence data. Just to remember, bioinformatics was born inside crystallography community and borrowed a lot from phylogenetics and molecular evolution. Despite the popularity sequence data, they're just the tiny big tip of a huge databerg.

ADD REPLY • link 14.6 years ago by Jarretinha 3.4k

0

Entering edit mode

I agree with Khader

ADD REPLY • link 13.9 years ago by Thaman ★ 3.3k

0

Entering edit mode

Bioinformatics started inside the crystallography community. NCBI was founded in 1988 and PDB in 1971. Bioinformatics started with x-ray data and 3D structural data. These aren't, definitively, sequence data. Sequences are just popular by now. When I think big about bioinformatics, the first thing that comes out of my head is CASP. There is nothing similar to it for sequence data. They're not just contributing new methods, they're rigorously benchmarking them. There's nothing similar to it for sequence data, not even in phylogenetics. I'm a sequence guy now, and I can say: we are far behind!

ADD REPLY • link 13.8 years ago by Jarretinha 3.4k

0

Entering edit mode

@Jarretinha: Agree with your points on NCBI and PDB, but http://www.dayhoff.cc/MODAtlas.html Atlas of Protein Sequence and Structure (started in 1961) is considered to be a seminal work in cataloging and archiving molecular biology data which lead to the data revolution in bioinformatics. I appreciate your opinion and views on CASP, but my point is still valid - even CASP is about fold recognition assigning a query 'sequence' with a 'fold' in a database of folds. There also the entity that we use to search is 'sequence'.

ADD REPLY • link updated 12 months ago by Ram 44k • written 13.8 years ago by Khader Shameer 18k

score 2 · Answer 9 · 2011-01-24

I wish we had an integrated pipeline to call SNPs, INDELs and CNV/SVs jointly in one go, and before that I would like to see how far we can go about INDELs and SVs. These are far from solved problems.

I wish we had a de novo assembler that is able to assemble a mammalian genome with reasonable compute resources and achieve a quality close to the human genome. Most of alternative human assemblies are much worse.

I wish we had an aligner that maps sequence not only to the reference genome, but to a collection of genomes. We have made progress, but there is still a long way to go.

Like Jarretinha, I wish we had a flexible software suite that estimates population parameters. I am thinking a sort of combination of MrBayes and ms instead of a library.

As to web development, my impression is that there are too many alternative ways which are all good but all make tradeoff. Different people have different preferences. Just like the versatility of programming languages, the versatility of web development will long live.

Ultimately I wish we could standardize each subfield to avoid countless choices (one or two programming languages, genome browsers, ontologies, mappers, assemblers, ...). This may happen and have already happened in a few subfields, but is never going to happen in all.

Ram · Answer 10 · 2011-03-26

Here's one I loved--not as it was, I'd want to modify it a bit. But this is the one I would love to be using: Caleydo. http://www.icg.tugraz.at/project/caleydo/

I talked about it here in a bit more detail: http://blog.openhelix.eu/?p=3578.

But I like it because it could conceivably put the 5 tools I need to visualize a lot of what I'm doing up in the 3D box. But I want to be able to pick the 5 tools I'm looking at: say UCSC on the box floor, dbSNP on one side, Reactome on another, expression data in another...etc. And if I could link them together with lines for my train of thought...that would solve the integration that I need for my level of exploration in many cases. Want.

score 2 · Answer 11 · 2011-04-05

2

Entering edit mode

13.7 years ago

Andrewjgrimm ▴ 460

Automating grant application writing.

What's the difference between a grant application and a spam email anyway? Both are asking for money, and they both have rejection rates.

ADD COMMENT • link 13.7 years ago by Andrewjgrimm ▴ 460

score 1 · Answer 12 · 2010-04-01

1

Entering edit mode

14.7 years ago

Yannick Wurm ★ 2.5k

I want to be able to orally describe what I want to visualize and it to just happen without me having to think about how to do it.

ADD COMMENT • link 14.7 years ago by Yannick Wurm ★ 2.5k

1

Entering edit mode

Okay, this one is just kind of hilarious.

ADD REPLY • link 13.8 years ago by Madelaine Gogol 5.3k

0

Entering edit mode

Why not a slot/controller in your brain with a direct connection with the machine? No need to talk !!! But, how this could be useful for bionformatics?

ADD REPLY • link 14.7 years ago by Jarretinha 3.4k

0

Entering edit mode

because a lot of time in bioinformatics is spent massaging data to get results and visualization...

ADD REPLY • link 14.7 years ago by Yannick Wurm ★ 2.5k

0

Entering edit mode

If only someone would make this cartoon just like that conversation with a biostatistician... http://blog.openhelix.eu/?p=5355

ADD REPLY • link 13.7 years ago by Mary 11k

0

Entering edit mode

@Mary: you can by typing the text at http://www.xtranormal.com/makemovies/

ADD REPLY • link 13.6 years ago by Andra Waagmeester 3.2k

score 1 · Answer 13 · 2011-01-27

1

Entering edit mode

13.8 years ago

Egon Willighagen 5.4k

The killer bioinformatics application I find missing is a sequence aligner that operates in O(1/N).

ADD COMMENT • link 13.8 years ago by Egon Willighagen 5.4k

0

Entering edit mode

Not even a quantum computer could accomplish that. Maybe a non-deterministic Turing machine or an oracle ...

ADD REPLY • link 13.8 years ago by Jarretinha 3.4k

score 0 · Answer 14 · 2010-03-26

0

Entering edit mode

14.7 years ago

Yuri ★ 1.7k

I'd like Bioconductor-like repository would exist for MATLAB. Probably not gonna happen. :(

ADD COMMENT • link 14.7 years ago by Yuri ★ 1.7k

score 0 · Answer 15 · 2016-12-02

0

Entering edit mode

8.0 years ago

rkostadi ▴ 60

I am looking for one that gives me the answer to life, the universe and everything. I guess something like echo "42". Killer apps need killer problems to solve.

ADD COMMENT • link 8.0 years ago by rkostadi ▴ 60

score 0 · Answer 16 · 2020-12-15

0

Entering edit mode

3.9 years ago

jerry ▴ 130

This is an old but interesting thread. Lots of good ideas (and some funny ones).

As you can see, it's been 10 years since this question was asked, but we are still using many of the same tools (samtools, bedtools, bowtie, UCSC genome browser, dbSNP, etc.). The problem is that there just isn't a market for innovation in Bioinformatics - the community and its user base is small. Given how difficult it is to create great Bioinformatics tools, it's not worth the effort in most cases.

ADD COMMENT • link 3.9 years ago by jerry ▴ 130

1

Entering edit mode

I think the field has matured to a point where we are able to benefit off tools not specifically built for bioinformatics. conda, snakemake, nextflow etc are examples of general purpose tools that we have adapted for our purposes.

The tools you describe are best at dealing with specific file formats or in cases you have not mentioned (such as plink), specific tasks or families of tasks. We have the basic pieces and putting them together doesn't need as much ab initio work.

ADD REPLY • link 3.9 years ago by Ram 44k