This is something that most bioinformaticians must have thought about or encountered. It seems generally accepted that there is a distinction between biological results from an experiment (in situ, gels, ..) and bioinformatics data (NGS, microarray, models).
The common view is that bioinformatics data, by itself, is intuitively less tangible and thus requires experimental validation.
Well designed experiments often give a very narrow range of results. Is the lack of black and white answer from the massive noisy datasets we usually generate contributing to this view? Is there really an inherent lack of quality in bioinformatics data compared to experimental data? Is the distinction between the two a false dichotomy?
What are your thoughts on this common perspective in the field?
I do think microarray and NGS data are "experimental data". I don't even know what could be "bioinformatics data". In silico experiments? Simulations? Maybe. Then, aren't they "experimental"? Not sure.
I think this a false dichotomy. As scientist we should always validate experiments with more experiments.
There is no difference between a biological experiment where all data analysis is done using paper and pencil or one where they use a supercomputer.
That is why I think a gel or a microarray is the same thing really, it is a physical part of an experiment.
Models which you classed in the bioinformatics data as different. A model is used to predict behaviour and models are validated using experiments.
As any bioinformatics tool chain uses multiple models (e.g. model of light reflection of a nucletide when sequencing, to a gene model) there is just more to be validated.
Yeah, both NGS and Microarrays are physical data generated by an experiment after all. They require more complex statistical analyses than many other "wet-lab" experiments but they are still real biological, experimental data.
Yeah I agree that, ultimately, both are results that come from physical experiments. I guess the scepticism that comes with bioinformatics data is perhaps the number of independent sources of data. Designing small-scale experiments will give you data from multiple perspectives versus a high throughput method that gives you high resolution data, but only from one perspective.
I would not make a distinction between bioinformatics data and experimental data. Every experimental data is processed by some computational instrument - including our own eyes/brain. So labeling data interpretation one way or another just based on type of computation does not seem right. As far as I am concerned NGS and bioinformatics data is experimental data.
What is more important is to properly asses the limitations of each methodology and to properly quantify the confidence and potential uses of the various observations that we can make from a particular data and computation associated with it.
For example no one would claim to cure cancer by observing bands on a western blot (and that is because we understand well what it is), whereas there is no shortage of people making all kinds of lofty claims based a particular bioinformatics approach (and that is because we usually don't understand what happens exactly).
As practicing bioinformaticians we need to work on this latter
The dichotomy is to me only partially justified. I guess there is one main reason why one would be skeptical: The use of models, and therefore approximations. This will obviously narrow the range of applicability of a study. The model layer is often larger in bioinformatics than experiments. It is however not always the case. Some experiments have large interpretation/model component that tend to be underestimated. A validation with an experiment only serves to show that approximations are valid and this is true also for experimental conditions.
I think that some of the skepticism is grounded in concerns over reproducibility. Every experiment that is published in a peer-reviewed journal should be reproducible at an independent site, right? We already know this is not a universal truth. Some "wet lab" scientists may be put off by the initial crudeness and low quality of microarray work. As an example, here is a paper that describes issues related to reproducibility between expression microarrays and more "traditional"methods such as RT-PCR. The issue with these types of papers is that they come out almost a decade after a technology has matured. Before then, only the groups developing analysis methods have a handle on what kind of false discovery rate a platform may have. Many wet lab methods were developed decades ago and have already matured to the point that many concerns about the validity of the method have been dispelled. We are not at that point yet for micro-array and second generation sequencing. We will arrive there, but right now some well-founded skepticism about "big data" experiments is not harmful. In fact, it could save you some time to understand why you can't "validate" all of your "hits" from a microarray experiment (I've heard my peers complain about this many times).
I do think microarray and NGS data are "experimental data". I don't even know what could be "bioinformatics data". In silico experiments? Simulations? Maybe. Then, aren't they "experimental"? Not sure.