Code Monkey home page Code Monkey logo

makepipe's People

Contributors

kinto-b avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

makepipe's Issues

Add execution information to pipeline

  • Whether a source/recipe was executed (store, but maybe don't display on chart)
  • Whether a source/recipe encountered any errors (store and display)

Allow user to supply labels

Allow user to supply labels to attach to nodes

make_with_source(
  source = c("Data prep" = "1 data_prep.R", 
  targets = c("Renamed/recoded data" = "data/1 dat.Rds"), 
  dependencies = c("Survey data" =  "scratch/data/0_raw_data.csv", "Postcode concordance" = "scratch/lookup/concordance.csv"),
)

Refactor Pipeline

Implement a Segment class to serve as the basic building block for the Pipeline. This will clarify the link between the fundamental make_*() functions and the Pipeline object and should make it easier to implement #3, #24, etc.

What should `make_with_*()` return?

The options are:

  • NULL invisibly
  • TRUE/FALSE invisibly depending on whether or not the targets were up-to-date
  • NULL or the result of executing the recipe. Note that this only applies to make_with_recipe, since whether the result(s) of sourcing the source are attached to the global environment depends on the arguments passed through to source()

Make dependencies optional

We may want to begin with a script that pulls data from a remote source. It should be straight forward to make the dependency argument optional

Expand tests

A large part of the package is devoted to pipeline visualisations, which are challenging to test. But I need to find a way to boost the test coverage up to at least 80%

Add tests argument to make functions

It'd be cool to have a test argument to the make functions that can either be an expression or a file/directory path, that is run at the end to validate the make.

Depending on the type:

  • An expression is evaluated as the test set
  • A file or directory path is evaluated using test_dir() or test_file() as appropriate

Would need to think about how to parse test results while also allowing maximum flexibility, since different testing approaches would return different things (e.g. test_dir/test_file return the test results, an expression block with a test_that call in it would return either NULL or throw an error).

For e.g.

make_with_source(
  dependencies = c("data/0_raw_data.csv", "lookup/concordance.csv"),
  source = c("1 data_prep.R"),
  targets = c("data/1_data.Rds"),
  tests = "tests/test-1-data.R"
)

Rename package

Unfortunately piper is taken. makepipe is the next best alternative

Don't fail if `dependencies` don't exist

Currently make_*() will throw an error if a supplied dependency doesn't exist.

However, there are certain scripts one might wish to run which gracefully take care of missing dependencies (e.g. they check whether the dependency exists and, if not, set up a dummy version within script).

It might be preferable to leave it to the underlying script/code to throw the error

cc: @paddytobias

Add `nomnoml` support

The flowcharts made by visNetwork are nice and interactive, but they become cluttered pretty quickly and they can't be easily exported as, e.g., png. I think using nomnoml by default may alleviate these issues. https://github.com/rstudio/nomnoml

Should move visNetwork to suggests

Add pipeline initialisation helper

As a first approximation, we expect the dependencies to be anything read in with readRDS(), read_*() and read.*() and the targets to be anything written out with saveRDS, write_*() or write.*()

So it should be possible to do static analysis of the scripts in a directory to pull out the likely dependencies and targets and produce a pipeline consisting of multiple make_with blocks.

Getting started vignette

Should write a vignette which highlights:

  • Conditional execution logic
  • Default execution environment
  • Return values

Alternatively, should build out the relevant section in the README.

Bug in pipeline

Out-of-dateness still not flowing through the pipeline properly despite 7a6a7cc.

Pipeline segments were being invalidated if the source file was newer than the dependencies. Need to prevent that from occurring.

Targets are out-of-date if and only if they are older than their dependencies, or their dependencies are older than their dependencies, and so on.

Store label/note in segment

Currently these are stored in the pipeline$nodes object but they should really go into the segment object so that they can get printed along with the other information

Make registration fails if quiet=TRUE

I must've been asleep at the wheel when I wrote this control flow. It doesn't make any sense

if (!quiet) {
msg <- "`make_register()` called outside of a 'makepipe' pipeline"
if (exists("__makepipe_register__", parent.frame())) {
register_env <- get("__makepipe_register__", parent.frame())
if (!is.environment(register_env)) warning(msg, call. = FALSE)
} else {
warning(msg, call. = FALSE)
return(invisible(value))
}
}

Use roxygen-like tags to declare dependencies

At the top of each script, something like

#' @title Merge
#' @description Merge population variables into survey data.
#' @dependencies [ "data/2 dat.Rds", "data/0 pop.Rds"]
#' @targets "data/3 dat.Rds"
NULL

Then something like make_with_dir() to either generate an R script containing make_with_*() blocks or else to simply execute the pipeline

Implement clean and rebuild functions

Once the pipeline has been executed, it stores information on the locations of all targets, dependencies and source files on disk. It also stores each recipe as a string.

We can leverage this to implement clean and rebuild functions:

  • clean_pipeline() would delete all targets
  • build_pipeline() would execute make_with_source() or make_with_recipe() for each source/recipe in the pipeline in the appropriate order
  • rebuild_pipeline() would call clean_pipeline() and then build_pipeline()

The most challenging part of this will be figuring out what the 'appropriate order' of execution is. GNU make decides where to start using topological sorting. Possibly there's an algorithm in visNetwork that I can use. Otherwise, I'll need to code something up.

Once this is done, it might be neat to introduce a pipeline() function for defining a pipeline in one fell-swoop

my_pipeline <- Pipeline$new(
  list(
    dependencies = c("data/0_raw_data.csv", "lookup/concordance.csv"),
    source = c("1 data_prep.R"),
    targets = c("data/1_data.Rds")
  ),
  list(
      dependencies = c("data/1_data.Rds", "data/0_pop.Rds"),
      recipe = {
        dat <- readRDS("data/raw_data.Rds")
        pop <- readRDS("data/pop_data.Rds")
        merged_dat <- merge(dat, pop, by = "id")
        saveRDS(merged_dat, "data/2_data.Rds")
      },
      targets = c("data/2_data.Rds")
  )
)

rebuild_pipeline(my_pipeline)

See also:

Allow user to add annotations in `make_*()`

Something like:

make_with_source(
  desc = "Load and clean raw data",
  dependencies = c("data/0_raw_data.csv", "lookup/concordance.csv"),
  source = "1 data_prep.R",
  targets = "data/1_data.Rds"
)

`make_with_recipe`: Error in basename(as.character(nodes$id)): path too long

It looks like make_with_recipe runs into an error when the recipe has more than a few lines of code

library(makepipe)
#> Warning: package 'makepipe' was built under R version 4.0.5
saveRDS(cars, 'raw_dat.rds')
make_with_recipe(
  recipe = {
    new_cars <- cars
    new_cars$a <- 1
    new_cars$b <- 2
    new_cars$c <- 3
    new_cars$d <- 4
    new_cars$e <- 5
    new_cars$f <- 6
    new_cars$g <- 7
    new_cars$h <- 8
    new_cars$i <- 9
    new_cars$i <- 10
   saveRDS(new_cars, 'processed_dat.rds')
  },
  
  dependencies = 'raw_dat.rds',
  targets = 'processed_dat.rds'
  
)
#> Error in basename(as.character(nodes$id)): path too long

Created on 2022-05-11 by the reprex package (v1.0.0)

Seems like it may have something to do with the makepipe visualisation?

Text summary

Would it be possible to provide a text summary of the pipeline as an alternative to the flowchart?

For example a make_with_source() block could be rendered,

**Merge - `3 merge.R`**
Merge population variables into survey data.

- Dependencies: `data/2 dat.Rds`, `data/0 pop.Rds`
- Targets: `data/3 dat.Rds`

Implement `make_return()`

As per #2, we want a formal way to specify return values that unifies make_*() variants.

Something like this should work:

make_return <- function(x) {
    signalCondition(rlang::cnd("makepipe_return_cnd", res = x))
    invisible(x)
}

make_with_source <- function(...) {
  ...
  out <- tryCatch(
    source(...),
    makepipe_return_cnd = function(x) {
        x$res
    }
  )
 ... 
 
  out
}

Add pkg option to `make`

Some pipelines will need to be re-run if a key package has been updated.

Mechanism will be same as with other dependencies: find package location and compare modification times

pkg_locs <- paste0(find.package(pkgs), "/DESCRIPTION")
out_of_date(target, pkg_locs)

Will need to think about how to incorporate into the pipeline viz

Submit to CRAN

First release:

Prepare for release:

  • devtools::build_readme()
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()

Submit to CRAN:

  • devtools::submit_cran()
  • Approve email
  • Action feedback:
    • Add some more details about the package functionality and implemented methods in the Description text.
    • Replace call to installed.packages() with find.package()
  • Resubmit

Wait for CRAN...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.