Learn the tidyverse

R for data science

The best place to start learning the tidyverse is R for Data Science (R4DS for short), an O’Reilly book written by Hadley Wickham and Garrett Grolemund. It’s designed to take you from knowing nothing about R or the tidyverse to having all the basic tools of data science at your fingertips. You can read it online for free, or buy a physical copy.

We highly recommend pairing R4DS with the RStudio cheatsheets. These cheatsheets have been carefully designed to pack a lot of information into a small amount of space. You can keep them handy at your desk and quickly jog your memory when you get stuck. Most of the cheatsheets have been translated into multiple languages.

Books

(Do you have a book you’d like to see listed here? Please submit a pull request!)

Online courses

(Do you have a course you’d like to see listed here? Please submit a pull request!)

  • Writing functions in R by Hadley and Charlotte Wickham, hosted on DataCamp. This course will teach you the fundamentals of writing functions in R so that, among other things, you can make your code more readable, avoid coding errors, and automate repetitive tasks.

  • Introduction to the tidyverse by David Robinson, hosted on DataCamp. This is an introduction to the dplyr and ggplot2 packages through exploration and visualization of country data over time. This is a suitable course for people who have no or limited experience in R and are interested in learning to perform data analysis.

  • Data visualisation with ggplot2 by Rick Scavetta, hosted on DataCamp. Covers the basics of ggplot2. Followed by part 2 which covers more advanced topics.

  • Exploratory data analysis in R: case study by David Robinson, hosted on DataCamp. This course brings ggplot2 and dplyr into action in an in-depth analysis of United Nations voting data. The course also introduces broom for tidying model output and the tidyr package for wrangling data into an explorable shape.

University courses

(Do you have a course you’d like to see listed here? Please submit a pull request!)

2017

  • Data Challenge Lab. Stanford University; Hadley Wickham and Bill Behrman. This is a 5-unit course using a flipped classroom. The curriculum is designed to cover each main thread of R4DS multiple times, diving a little deeper at each pass.

  • M.Sc. Industrial Analysis: An International Perspective. HEC Montreal; Thierry Warin. Graduate program in Data Science for International Business (DS4IB), where students learn how to use RStudio, RMardown, the tidyverse and open data in a reproducible research workflow. Hosted at Dr.HECtoR.

  • Better Living with Data Science. Duke University; Mine Cetinkaya-Rundel. Data Science course for first year undergradiates with little to no computing background. Combines techniques from statistics, math, computer science, and social sciences, to learn how to use data to understand natural phenomena, explore patterns, model outcomes, and make predictions. Data wrangling, exploratory data analysis, predictive modeling, data visualization, and effective communication of results. Discussions around reproducibility, data sharing, data privacy.

  • Statistical Computing Duke University; Colin Rundel. MS level statistical computing course focusing on Best practices and software development for reproducible results, selecting topics from: use of markup languages, understanding data structures, design of graphics, object oriented programming, vectorized code, scoping, documenting code, profiling and debugging, building modular code, and version control-all in contexts of specific applied statistical analyses.

  • FE8828 Programming Web Applications in Finance Nanyang Technological University; Dr. Yang Ye Master for Financial Engineering. An intermediate-to-advanced level programming course in R for data analytics and interactive content via web. It teaches R Markdown, Shiny, Tidyverse (dplyr/tidyr/ggplot2/lubridate).

  • Computing for the Social Sciences University of Chicago; Benjamin Soltoff. This is an applied course for social scientists with little-to-no programming experience who wish to harness growing digital and computational resources. The focus of the course is on generating reproducible research through the use of programming languages and version control software. Major emphasis is placed on a pragmatic understanding of core principles of programming and packaged implementations of methods. Students will leave the course with basic computational skills implemented through many computational methods and approaches to social science; while students will not become expert programmers, they will gain the knowledge of how to adapt and expand these skills as they are presented with new questions, methods, and data.

2016

  • Stat545; UBC; Jenny Bryan. Data wrangling, exploration, and analysis with R

2012

  • Stat405; Hadley Wickham, Rice University. Mainly included for historical interest - you can see some of the work that lead up to the creation of the tidyverse.
Upcoming events
San Diego, CA
Jan 31-Feb 2
rstudio::conf 2018 covers all things RStudio, including workshops to teach you the tidyverse, and talks to show you the latest and greatest features.