I teach an R workshop at my university thats targeted toward researchers with little background in stats or computing. I'm working on expanding this to create a series that will include (1]) an intro to R, (2) an advanced data manipulation workshop (read: dplyr and tidyr), and (3) an advanced data visualization workshop (read: ggplot2).
In the past I've used the diamonds dataset for ggplot2 examples and the nycflights13 dataset for showing off dplyr. What I'd really like to do is find some data that will resonate with a biomedical researcher that's big (10,000+ rows) and complex enough to motivate exploring with dplyr and ggplot2, namely, some continuous measures that may correlate or behave differently depending on the level of other factor variables in the data. Something like some drug trial by cell line data, some other kind of clinical measurements by cancer type, etc. Anyone have any pointers?
Thanks
Thanks, this is helpful. After limiting by multivariate, matrix, mixed data types, at at least 1000 samples, looks like the covertype dataset might be a good candidate. I'll have to look further. Thanks again.