Advanced R Programming
Dates: 25-29 June 2018
Instructor: VP Nagraj (University of Virginia)
Overview
R is a popular open-source statistical computing language. This course will introduce techniques for programming in R, including how to work with a variety of data structures, write functions and package code for distribution. Each lesson will be interactive with hands-on activities to solidify concepts covered in lectures. The week will conclude with a package development practicum, during which attendees will have the opportunity to implement methods covered in class for a project of their own design.
Target Audience
The goal of this course is to provide training on how to start using R as a programming language: to write new functions, develop new tools or, at the very least, interact with data programmatically. While some examples may be drawn from bioinformatics, statistics and / or epidemiology, the material will not focus on data analysis techniques. As such, the course will generally be domain-agnostic. Researchers from a variety of disciplines are welcome. Those interested should have basic familiarity with R or another programming language.
STRUCTURE
The course material will be delivered over 5 days, in 10 half-day sessions. These lessons will build off of one another, and feature a mix of lecture and in-class exercises.
Program
Monday 25th. 09:30-17:30
Session 1: Data Structures
In the first session we will survey data structures in R. We'll cover assignment, indexing and programmatic interaction with a number of types of objects, including vectors, matrices, lists and data frames.
Session 2: Conditionals and Iteration
Building on the data structures overview, we'll explore approaches for implementing logic, testing conditions and iterating over R objects. The material will include comparisons via operators, looping, vectorization, "apply" functions and control flow.
Tuesday 26th. 09:30-17:30
Session 3: Functions
This session will introduce essentials of writing functions in R. We will discuss methods for passing arguments, establishing return values and handling errors. We'll also spend time covering R environments, as well as the system of searching for objects from within a function and beyond.
Session 4: Debugging, Performance and Optimization
Writing a function is one thing ... fixing semantic and syntactic errors is another altogether. We'll dive into this topic by way of looking at tools that are built into R and RStudio for debugging code. This lesson will also cover methods for diagnosing performance roadblocks and profiling code.
Wednesday 27th. 09:30-17:30
Session 5: Object Oriented Approaches
In the context of data analysis, R is most commonly used as a functional programming language. This session will introduce methodology for object oriented programming (OOP) in R. The material will include how to define S3 and S4 classes, as well as how to establish methods for generic functions.
Session 6: Version Control with Git and Github
Git is a powerful program for version control and collaboration. This session will introduce workflows for using Git and Github, which is a popular platform for managing projects and distributing code. We'll pay particular attention to how these tools are implemented in the context of R programming and package development.
Thursday 28th. 09:30-17:30
Session 7: Package Development Tools (Part 1)
In this session, we'll survey tools, techniques and use-cases for R package development. In addition to version control tools (i.e. Git), there are a quite a few packages, frameworks and convenient integrations with the RStudio IDE that can facilitate the package development workflow. We will focus on package structure, documentation, building and checking, handling dependencies and unit testing.
Session 8: Package Development Tools (Part 2)
We'll continue to explore tools for package development in this lesson. In particular, we'll discuss how to implement more robust checks, continuous integration methods, vignettes, protocols for releasing a package and best practices for including data.
Friday 29th. 09:30-17:30
Session 9: Package Development in Practice
With a baseline understanding of tools for package development, we'll explore several case studies that further enumerate the motivations and practical applications of developing your own package. This session will also feature time allocated for individualized research and development on a project of your choosing.
Session 10: Next Steps
R is a dynamic resource, with new features being added and new trends in how it is used as a programming language. We'll conclude with an overview of emerging areas of R development, including some specific technologies like htmlwidgets and Shiny. This session aims to embolden attendees with the necessary perspective to follow new developments and implement the skills from the course in future projects.
For more information about the course, please visit our website