Are Procedural Failures In Microarray Based Research A Real Issue?
3
6
Entering edit mode
14.0 years ago
Blunders ★ 1.1k

The US FDA's MicroArray Quality Control (MAQC) consortium's latest study (HTML or PDF) suggests that human error in handling DNA microarray data analysis software could delay the technology's wider adoption in the clinical research.

  • Do you agree with this analysis?
  • How common are procedural failures in microarray based research relative to non-bioinfomatic based research?
  • More importantly, how researchers control human error, report errors, and define the risks of errors?
quality microarray data analysis • 2.6k views
ADD COMMENT
8
Entering edit mode
14.0 years ago

The take-home messages of the paper are sound, and follow what I would have expected:

  1. Some problems are easy (Guess the sex!), while others are hard.

  2. Experienced teams do better than novice teams

  3. Most standard algorithms are equally good at finding clear signals.

  4. It's very easy to screw up the initial bookkeeping and hose your whole project

  5. Cross-validate and don't pick arbitrary "training" and "test" sets.

  6. Almost no one actually does reproducible research. The only way to publish a reproducible result is to hand someone your raw data and a turn-key script that reproduces your classifier.

I think protocol errors (off-by-one, etc) are common in bioinformatic analysis, but my own opinion is that this is true of any high-throughput method in inexperienced hands (Mass Spec, FACS, etc). There are numerous ways to reduce these protocol errors, and the reference list for that paper has many good suggestions. Anyone can make a mistake, but the more analysis you do, and the more systematic you are about it, the better you get. I think the key is being relentlessly systematic and working so that you can always reproduce the whole analysis. Develop recipes for analytical problems so that the mundane business is routine. Do your work so that you can turn a key and re-do your whole analysis from raw data on command. Assume your dataset has batch effects. Assume that Keith Baggerly will be the next person to look at your methods section.

ADD COMMENT
2
Entering edit mode

My own methods (worked out over years of struggling with this issue) are to perform my analytical work in a way that leaves as completely reproducible an artifact as I can generate for me or another observer. In practice, this means that code used to normalize datasets is stored in a standard location next to those data, and all code required to generate publishable results can be extracted from an online lab notebook. I can pull out the R code from my notebook, run it unchanged, and generate Figure 1, Figure 2, etc. I still make mistakes, but they can be traced.

ADD REPLY
1
Entering edit mode

@David Quigley: I agree that "this is true of any high-throughput method in inexperienced hands" -- which is the problem, based on my own anecdotal observations, a significant number of scientist do not have a background in high-throughput systems, and have grown into the role. Further, based on what I'm seeing so far, it's not the norm to have turn-key systems end-to-end, or controls in place to confirm the effects of any changes made to the system. Are your systems turn-key end-to-end, and do you have controls that test the effects of any changes made to the output of the system?

ADD REPLY
7
Entering edit mode
14.0 years ago
Neilfws 49k

If you want a real microarray horror show, you should read:

Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology

Which analyses 5 case studies and finds a large number of errors, most of which relate to "simple" tasks such as labeling samples correctly.

As to your questions:

  1. The MAQC study looks like a very good analysis (and much what you'd expect)
  2. Procedural failures are quite common in all research, both computational and lab-based
  3. The best error control is to have multiple eyes examine your work; unfortunately, many researchers work in near-isolation

Not sure I'd agree that this will delay clinical adoption of microarray technology. There are already many diagnostic labs that use microarrays and you'd imagine that to be certified, they require stringent QC practices. My (personal, subjective) feeling is that microarrays are on the way out anyway and will be largely replaced by deep sequencing methods in the next 5-10 years.

ADD COMMENT
0
Entering edit mode

I'm looking forward to a sequencing-based world. However, deep sequencing methods will still be susceptible to all of the fundamentally hard parts of gene expression analysis. There's a reasonable argument to be made that while more informative, sequencing will also provide many new opportunities for error and unintentional bias.

ADD REPLY
0
Entering edit mode

@neilfws: +1 Agree with all your points, and thanks for additional resource on the subject.

ADD REPLY
0
Entering edit mode

Completely agree - there'll always be error and bias. But hopefully less noise, since a sequence should either be present or not.

ADD REPLY
2
Entering edit mode
14.0 years ago
User 59 13k

Hmm. So much time removing batch effects from experiments. So many times trying to work out which samples have been mislabelled by the company that ran the arrays..

I don't think it's going to delay adoption much though. I understand that the clinical environment is somewhat different to the research environment, but weight of evidence wins out in the end. I endorse Neil's answer wholeheartedly with regards to the 'end of days' for arrays.

ADD COMMENT

Login before adding your answer.

Traffic: 1884 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6