Code Monkey home page Code Monkey logo

pipeline_tools_targets's Introduction

Using pipeline tools to ensure analyses are reproducible and understandable, using R's target package

This half-day course explains why pipeline tools such as make, snakemake and targets are indispensible tools in (reproducible) data analyses.

Feedback survey

If you have taken this course and are willing to provide a answers to a short (1-2 minute) survey, please fill these in here.

Lecture and problem sets

There is a short lecture here.

Applied example: there are two folders:

  • this contains a blank R project along with the data in a subdirectory, which can be a starting point for developing a reproducible data analysis
  • this contains a completed R project along with the required _targets.R file

Steps of analysis

  1. get and clean data
  2. fit a model: ozone ~ temperature
  3. plot the model's fit versus data
  4. diagnose any issues with model fit; if necessary, change the model and rerun the above steps

Applied target steps

  1. clone the repo and double click on the r_project_blank_slate.Rproj icon to launch RStudio
  2. install targets package via install.packages("targets")
  3. type use_targets() in the console, which should create an _targets.R file
  4. add tidyverse as a package under the package item in the _targets.R file
  5. comment out tar_source()
  6. remove the list of example targets that have been generated in the _targets.R file (but keep the list which had these targets within it)
  7. create a folder called scripts that has in it a file called clean_data.R
  8. in clean_data.R write a function that takes the data\raw\airquality.csv file, renames the columns using only lowercase letters and removes any rows that have NA values in them
  9. in _targets.R add in the preamble source(scripts\clean_data.R) to ensure that targets has access to your function
  10. in the _targets.R file, add a target for the cleaned data via tar_target(data_airquality_cleaned, clean_data(filename))
  11. in the _targets.R file, create a leaf target for the file itself via tar_target(filename, "data\raw\airquality.csv", format="file")
  12. visualise the network via tar_visnetwork(names=data_airquality_cleaned)
  13. clean the data using tar_make()
  14. revisualise the network to check that everything is up to date
  15. type tar_read(data_airquality_cleaned) in the console to view the cleaned data
  16. continue with this methodology to create a full data analysis

Prerequisites

  • basic R skills (although not essential)
  • some experience of having done data analysis

pipeline_tools_targets's People

Contributors

ben18785 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.