A well delivered, highly entertaining lecture on the importance of reproducible research in high-throughput biology. Keith Baggerly walks us through an exercise in 'forensic bioinformatics'.
Those with bioinformatics experience may find it entertaining and cringe inducing. Those new to the new field may find it illuminating and educational on the topics of classification, principle component analysis, gene expression estimation, etc.
The lecture video:
The Importance of Reproducible Research in High-Throughput Biology: Case Studies in Forensic Bioinformatics
The data and analysis being dissected relate to the now famous Duke/Potti controversy described by the New York Times, 60 minutes and others in which the work of Biostatisticians Keith Baggerly and Kevin Coombes ultimately led to the retraction of ten papers by Potti, and the cancellation of clinical trials. A recent Nature article provides a follow-up and guidance for uncovering research misconduct.
An interesting case study on the importance of being able to reproduce complicated computational analyses, how some very problematic work from an otherwise productive and potentially useful area of research (personalized chemotherapy prediction) went horribly wrong, and how we might use this experience to promote transparent, open source, and reproducible research in bioinformatics. For example, by using things like 'Sweave' (homepage, the wiki, and a demo). Another approach is to use, Rmarkdown and Knitr.