This website hosts training materials for “Introduction to R for Applied Epidemiology”. This course teaches the fundamentals of R for applied epidemiologists and public health practitioners.
Applied Epi is a nonprofit organization supporting frontline practitioners through open-course analytical tools, training, and support. Our Epidemiologist R Handbook is a free R reference manual which has been used by 150,000 people around the world.
Follow these step-by-step instructions to download the course files, setup an RStudio project, and to download and begin the course’s interactive exercises.
Click each of the steps below to expand the instructions
- Download course files
Click here to download a zipped folder to use in the course exercises.
Unzip the folder and save it on your computer’s desktop (not on a shared drive). To “unzip” a folder once it is downloaded, right-click on the folder and select “Extract All”. If offered a choice of location to save the unzipped folder, save it to your desktop.
If you are unable to download, try this link, or ask your course organizer for an emailed version.
- Create a new RStudio project
-
Open RStudio. Ensure that you open RStudio and not just R.
-
In RStudio click File -> New Project. In the pop-up window, select “Existing directory”.
-
Click “browse” and select the “intro_course” folder on your desktop, that you downloaded earlier, which contains the course materials.
-
Click “Create project”
Voila! This will be the project for ALL of your work in this course.
- Access Applied Epi course exercises
Ask your instructor for the link to the online interactive exercises. If you have unreliable internet, ask your instructor for assistance.
- Begin the first exercise
Now, the course exercises will appear within your RStudio. Each course module has a corresponding exercise, which can be accessed through the “Tutorials” pane in RStudio (upper-left). The gif below introduces you to the exercise environment (you do not need to follow the steps shown right now).
- Click on the “Tutorial” tab in the upper-right RStudio pane (which also contains a tab holding your “Environment”).
- Scroll down and review the listed exercises. If you do not see any “Applied Epi” exercises listed, close and re-open RStudio. They may take a minute to appear.
- Select the exercise “Applied Epi - R setup, syntax, data import”
- The exercise will load. Once you see the Applied Epi logo appear in the Tutorials pane, you can begin the exercise.
- To see the sidebar in the exercise, you may need to adjust the Tutorials pane to be wider. You can also adjust the zoom from the “View” menu.
- You can view the exercise in this pane, or click the small icon in the upper-left to pop-out into a separate window.
We welcome you to the course and dive into the basics of how to interact with R and RStudio, basic R syntax, and how to organize your analytical projects using public health examples. We then cover R functions and packages, and introduce the core functions used to import data. Using these, we import the Ebola case study surveillance linelist, and begin to inspect and review it.
-
Slides: Welcome, course logistics, RStudio, and basic R syntax
-
Live demonstration (instructor guide)
-
Exercise:
Now that we have our surveillance linelist in R, we cover what “data cleaning” steps are necessary and how to execute these in R. Along the way, we introduce many of the core R functions including adjusting column names, deduplicating and filtering rows, selecting and modifying columns, recoding values, and more. Together, we write a sequence of “pipes” to clean the linelist step-by-step in a clear, reproducible manner… so that our dataset is ready for preliminary analysis!
-
Exercise:
Informative tables are the bedrock of epidemiological and public health practice. In this module we introduce three tools to produce tables of summary statistics: {dplyr} for flexibility, {janitor} for speed, and {gtsummary} for beauty. Finally, we explore {flextable}, which can be used to beautify any of the above approaches, add colors and highlights, and save tables to Word, PNG, HTML, etc.
-
Exercise:
Using the {ggplot2} package to maximum effect rests upon understanding how to apply its “grammar of graphics” to build a plot layer-by-layer. We tackle this by introducing the grammer piece-by-piece so that you build upon previous knowledge to construct informative and colorful bar plots, scatter plots, histograms, line plots, text plot labels that automatically refresh with updated data (very useful for epidemiological reports!), and more.
-
Exercise:
-
Exercise:
Public health analytics rarely involves just one data set, so now we practice joining data by adding hospital, laboratory, and case investigation data to our surveillance linelist. We ingrain best practices for conducting joins, and prepare you for doing data transformations independently. In the second part of this module, we address pivoting, which in R means transforming data between “long” and “wide” formats. This is particularly relevant in public health, where each format has distinct benefits.
-
Exercise:
-
Exercise:
In this second data visualization module we encourage you to practice learning R independently (a necessary skill once you leave the class!) but with our support. We tackle visualizations that are central to descriptive epidemiology: the intricacies of crafting an accurate epidemic curve, conveying patterns in three variables using a heat plot, and creating age/sex pyramids to describe demographics. If there is time, we finish with a demonstration of R’s GIS/geospatial capabilities.
- Exercise:
In this module, we take the R code on the Ebola case study that you have been building throughout the course and convert it into a reproducible, automated report (Word, PDF, HTML, etc.). We teach you the variations in syntax and opportunities that lie in being able to produce documents that update when incoming data is refreshed, that look professional, and can be sent to inform public health partners and stakeholders.
-
Live demonstration Instructor guide
-
Exercise:
In this last module, your skills are tested as you have to produce an R Markdown report using a COVID-19 case linelist. Unlike with the Ebola case study, you will not have the answer code available to you. When you finish, we perform “code reviews”, simultaneously improving your coding skills and teaching you how to review others’ code. Before closing, we touch upon how to find your particular community of R users, resources available to you for questions, and close with a feedback survey.
- Slides: COVID case study
- Exercise materials: See the folder “learning_materials/covid_case_study” for the Word document report to replicate, the data, and a tip sheet.
Our instructors know public health. One of the signature features of Applied Epi’s training is that we provide follow-up support to your team, to help you apply your new skills to your work context.
We schedule five 1.5-hour sessions with your team at in the 3 months post-training. In these sessions, we help you troubleshoot code, advise you on analytical strategies, or guide you in new learning that you need.
- Please note that all of our case study training materials use fake example data in which no person is identifiable and the actual values have been scrambled.
- Modifications are possible so that the course uses data from your jurisdiction. Email us at [email protected] us to discuss.
Authors and contributors to this course curriculum from Applied Epi include:
-
Neale Batra
-
Arran Hamlet
-
Mathilde Mousset
-
Alex Spina
-
Paula Blomquist
-
Amy Mikhail
-
The Fulton County Board of Health graciously provided example data (anonymized and scrambled) for a case study.
-
The {outbreaks} package formed the basis for the fake dataset in the Ebola case study.
This
work is licensed under a
Creative
Commons Attribution-NonCommercial-ShareAlike 4.0 International
License.
Please email [email protected] if you could like to use these materials for an academic course or epidemiologist training program.
If you would like to make a content contribution, please contact with us first via Github issues or by email. We are implementing a schedule for updates and are creating a contributor guide.
Please note that the Epi R Handbook project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.