I want to give a 20-30 minute talk on the subject of Reproducible Science in biology. What I think would be thrilling for the audience is that I do a self-contained small experiment live while coding, documenting, and commenting it.
I would like to start from an existing template. I have seen nice examples posted on blogs in the past that featured both code and results. Is there such an example that you would suggest for my talk?
Here are some ideas I have for the talk:
- Do analyses that are interesting for biologists
- Produce cool figures
- Keep it simple for biologists (e.g., avoid complicated analyses or packages)
- Start from an empty directory
- Get the data with wget
- Use R (most familiar language for biologists where I work)
- Use RStudio to make it more palatable than my green on black terminal :)
- Use a bash script (or a VERY simple make script) to run everything
- Create all the output dynamically (figures, tables...)
- Use GitHub to version control everything, including the data (smallish text file)
- Test the whole pipeline on another data set to show how much time this approach can save
Do you have any suggestions for such a talk?
Thanks in advance for your input
Thank you Pierre. It's definitely a cool example, but from the point of view of a biologist with basic notions of R, I don't think we can pretend this is even remotely simple. I am trying to lure them in, not scare them off ;)