Code Monkey home page Code Monkey logo

rr-intro's Introduction

rr-intro

People:

Mine Çetinkaya-Rundel (@mine-cetinkaya-rundel), Paul Magwene (@pmagwene), Pat Schloss (pschloss), Kristina Riemer (KristinaRiemer)

Lesson synopsis:

In this session we will start by reviewing case studies of (lack of) reproducibility gone wrong. Then participants will work on two reproducibility exercises: first a simple data manipulation and analysis exercise using any software they generally work with and then the same exercise (and extensions to it) using RMarkdown in RStudio as a better alternative, highlighting how this approach makes documentation, organization, automation, and dissemination easier.

Syllabus:

  • Recognize the problems that reproducible research helps address
  • Identify pain points in getting your analysis to be reproducible.
  • The role of documentation, sharing, automation, and organization in making your research more reproducible.
  • Introducing some tools to solve these problems, specifically R/RStudio/RMarkdown.

Intro sessions:

At the beginning of this workshop, participants should be able to

  • use a spreadsheet program to generate a plot
  • use a text editor (Word, Google Docs, etc.) to communicate

At the end of the intro sessions students will be able to

  • recognize the problems that reproducible research helps address
  • identify pain points in getting their analysis to be reproducible.

The specific problems to be addresses in each session are as follows:

  • Session 1: documentation and sharing
  • Session 2: automation and organization

The second session will also introduce them to some tools that can be used to solve these problems, specifically R/RStudio/RMarkdown.

Session 1:

Pre-workshop

Participants install R+RStudio

See Intro Session 1 materials for email template

Intro

Ex 1: Motivating reproducibility

Data analysis task + share/reproduce + discuss. Outline is in the instructor notes (intro-01-instr-notes.Rmd).

Coffee break:

Catch everyone up with R/RStudio instructions

Session 2:

  • Provide RMarkdown approach to what's done in Session 1 (intro-01-template.Rmd)

Ex 2: Extending your analysis

Demonstrate how an approach based on executable scripts and self documenting code makes it easier to automate, organize, and extend our analyses. Outline can be found in the instructor notes (intro-02-instr-notes.Rmd).

  • Wrap up with reviewing the reproducibility checklist is at checklist.md.

Data attribution

  • Gapminder data. Gapminder data is licensed CC-BY 3.0.

  • Processed and subset (population size, life expectancy, GDP per capita; only every 5 years only starting 1952, only complete records) Gapminder data as R package. The data-raw sub-directory reveals the journey from Gapminder.org's Excel workbooks to increasingly clean and tidy data.

    • clean dataset can be located in R in the following way (after installing the package):

      pathToTsv <- system.file("gapminder.tsv", package = "gapminder")

rr-intro's People

Contributors

mine-cetinkaya-rundel avatar pmagwene avatar pschloss avatar hlapp avatar fmichonneau avatar kcranston avatar kristinariemer avatar iamciera avatar kbroman avatar tracykteal avatar

Watchers

James Cloos avatar Natasha Vitek avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.