Hadley Wickham completed his undergraduate studies at the University of Auckland and his PhD at Iowa State University. Currently he serves as Chief Scientist at RStudio and is an adjunct Assistant Professor of statistics at Rice University
Before Dr. Wickham's contributions to it the R statistical language used to be considered a powerful yet occasionally counterintuitive language that seemed to lack the simplicity and consistency that made programming fun. That was about to change with the release of the ggplot2 package that implemented a novel scheme for data visualization. Called Grammar of Graphics it breaks up graphs into semantic components such as scales and layers. This allows users to generate complex, multifaceted and layered visualizations with unprecedented ease.
Besides ggplot2 Hadly Wicham is also developing functional programming based libraries and modules that are infusing the R language with an exciting new flavor. Packages like:
- plyr - generalized function application
- stringr - simple string manipulation libraries
- reshape- restructuring data in R
together with ggplot2 are radically altering the way we think about and use R. These libraries greatly improve the overall productivity of scientist across the globe and affect society at large.
Dr. Wickham maintains a long list of open source software on GitHub: https://github.com/hadley
Hadley Wickham of ggplot2 and RStudio
What hardware do you use?
Macbook 15" retina. I've recently re-started using it with my old 30" Apple Cinema monitor. Most of the time I live inside a single maximized RStudio instance, but having more screen real estate is really handy when I'm working on multiple projects. It also makes it surprisingly pleasant to review articles purely electronically (I normally prefer paper).
What is your text editor?
95% RStudio + 5% sublime text 2. There are only a handful things that I leave RStudio for: mostly column selections and sorting lines (both which are on the long-term todo list for RStudio).
What software do you use for your work?
Here's what I have open now:
- RStudio (only 1 project, which is unusual). That's where I'm writing this response. I use it for R, Markdown and C++.
- Adium: my chat client. Unfortunately it's been getting steadily less useful because google keeps removing features from the jabber gchat endpoint
- OS X Calendar, Notes, Reminders: I used the OS built-ins just because they're there and they sync painlessly with my phone. Shared reminders are pretty useful for shopping lists etc.
- Safari & Chrome: I've switched back to safari as I feel like google is getting creepier and creepier with what it does with my data. Also the auto-updating broke and I couldn't figure out how to fix it. I still use Chrome occasionally for sites that need it.
- Terminal: just the standard OS X terminal for me.
- Dropbox: I store large datasets here, and use public folders to share code, data and slides when I teach.
What do you use to create plots and charts?
ggplot2 (duh). I'm increasingly using ggvis - we're slowly adding features to bring it up to parity with ggplot2, and having interactivity is awesome. I really like that ggvis is designed to seamlessly interleave with dplyr - data manipulation is a crucial part of data visualisation.
What do you consider the best language to do data science with?
R! I really strongly believe that R is the best language for data science, and I spend most of my time trying to make it even better. That's not to say there aren't lots of other awesome tools, and in reality, most people will use multiple tools. In every data science team that I've spoken with, there are people using python and R, and often javascript as well.
SQL is a really important data language, since so much data lives in databases, and if you're doing simulations or developing new techniques being able to use a high-performance language (like C++ or Julia) is really important.
Personally, I've found learning C++ and Rcpp (which makes it painless to call C++ functions from R) to be transformative. For example, I'm currently incorporating the ideas and code from bigvis into a new package called ggcomp, which will power the statistical transformations in ggvis and will make it possible to use ggvis with tens to hundreds of millions of points. That wouldn't be possible without C++!
What tools/software do not get enough recognition?
Here's three that I love and not enough people know about:
- Selectorgadget: if you ever do any web scraping, you will love the way it learns css/xpath selectors based on positive and negative examples.
- iDoneThis: we use this at RStudio. It's a great way to keep track of what you've achieved, and to see what your colleagues are working on.
- appear.in: super simple video chat. No logins, just share a link, and the quality is way better than google hangouts.
See all post in this series: https://www.biostars.org/tag/uses-this/
To be notified of new post in the series follow the first post: Jim Robinson of the Integrative Genomics Viewer (IGV) uses this