Code Monkey home page Code Monkey logo

rr-intro's Introduction

rr-intro

Lesson synopsis:

In this session we will start by reviewing case studies of (lack of) reproducibility gone wrong. Then participants will work on two reproducibility exercises: first a simple data manipulation and analysis exercise using any software they generally work with and then the same exercise (and extensions to it) using RMarkdown in RStudio as a better alternative, highlighting how this approach makes documentation, organization, automation, and dissemination easier.

Syllabus:

  • Recognize the problems that reproducible research helps address
  • Identify pain points in getting your analysis to be reproducible.
  • The role of documentation, sharing, automation, and organization in making your research more reproducible.
  • Introducing some tools to solve these problems, specifically R/RStudio/RMarkdown.

Goals:

At the beginning of this session, participants should be able to

  • use a spreadsheet program to generate a plot
  • use a text editor (Word, Google Docs, etc.) to communicate

At the end of the session students will be able to

  • recognize the problems that reproducible research helps address
  • identify pain points in getting their analysis to be reproducible

The specific problems to be addressed in each session are as follows:

  • First half (01): motivating reproducibility
  • Second half (02): introduce R Markdown as a reproducible data analysis tool

The first half of the intro session is language agnostic. If a workshop uses programming language other than R, only intro-02 will need to be modified.

Pre-workshop:

Participants install R + RStudio.

See email template.

First half (01):

See instructor notes (intro-01-instr-notes.Rmd) for details.

  • Welcome + go over schedule
  • Motivating reproducibility slides
  • Group discussion about current tools people are using for documentation / reproducibility
  • Ex 1: Motivating reproducibility

Second half (02):

See instructor notes (intro-02-instr-notes.Rmd) for details.

  • Provide RMarkdown approach to what's done in Session 1 (intro-template.Rmd)

  • Wrap up with pointing participants to the reproducibility checklist.

Data attribution

  • Gapminder data. Gapminder data is licensed CC-BY 3.0.

  • Processed and subset (population size, life expectancy, GDP per capita; only every 5 years only starting 1952, only complete records) Gapminder data as R package. The data-raw sub-directory reveals the journey from Gapminder.org's Excel workbooks to increasingly clean and tidy data.

    • clean dataset can be located in R in the following way (after installing the package):

      pathToTsv <- system.file("gapminder.tsv", package = "gapminder")
      

      {: .r}

People and credits

This lesson was first created at the 1. Reproducible Science Curriculum Hackathon. The corresponding author is Mine Çetinkaya-Rundel (@mine-cetinkaya-rundel). See the commit log for other contributors.

Please post feedback and issues with the lesson on the repository's issue tracker. For instructor questions about teaching this lesson, you can also contact the corresponding author directly.

rr-intro's People

Contributors

abbycabs avatar brandoncurtis avatar erinbecker avatar evanwill avatar fmichonneau avatar gvwilson avatar hlapp avatar iamciera avatar ianlee1521 avatar jduckles avatar jpallen avatar jsta avatar katrinleinweber avatar kcranston avatar kristinariemer avatar mawds avatar maxim-belkin avatar mine-cetinkaya-rundel avatar mr-c avatar naught101 avatar neon-ninja avatar pbanaszkiewicz avatar pipitone avatar pmagwene avatar pschloss avatar rgaiacs avatar synesthesiam avatar tracykteal avatar twitwi avatar wclose avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rr-intro's Issues

RFC: Migration to R 4.0

During the June Maintainer meeting, we asked for comments and experiences with the migration to R 4.0 so that we could create guidance for maintainers and instructors. We have drafted a short blog post (carpentries/carpentries.org#830) to be released next week (2020-08-03) that describes our recommendations for migration. You can find a preview of the blog post here. Please look over the blog posts and make comments by 2020-07-30 so that we can incoroporate any changes before the post goes live.

To help identify the differences between R 3.6 and R 4.0, I have run this lesson in both versions and posted the results that show the differences in output chunks and entire markdown files.

Intro_02 length

How long should the 02_intro section (including demo) take? The slide deck has been moved to the 'slides' directory. Does 30 minutes seem reasonable for the demo?

Transition to standardized GitHub labels

The lesson infrastructure committee unanimously approved the proposal of using the same set of labels across all our repositories during its last meeting on May 23rd, 2018.

This repository has now been converted to use the standard set of labels.

If this repository used the previous set of recommended labels by Software Carpentry, they have been converted to the new one using the following rules:

SWC legacy labels New 'The Carpentries' labels
bug type:bug
discussion type:discussion
enhancement type:enhancement
help-wanted help wanted
newcomer-friendly good first issue
template-and-tools type:template and tools
work-in-progress status:in progress

The label instructor-training was removed as it is not used in the workflow of certifying new instructors anymore. The label question was left as is when it was in use, and removed otherwise. If your repository used custom labels (and issues were flagged with these labels), they were left as is.

The lesson infrastructure committee hopes the standard set of labels will make it easier for you to manage the issues you receive on the repositories you manage.

The lesson infrastructure committee will evaluate how the labels are being used in the next few months and we will solicit your feedback at this stage. In the meantime, if you have any questions or concerns, please leave a comment on this issue.

-- The Lesson Infrastructure subcommittee

PS: we will close this issue in 30 days if there is no activity.

Reproducibility checklist

The reproducibility checklist in lesson 02 isn't actually a checklist. I'll work create a checklist.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.