The makepipe from kinto-b

Add execution information to pipeline

Whether a source/recipe was executed (store, but maybe don't display on chart)
Whether a source/recipe encountered any errors (store and display)

Add execution time to metadata

It would be useful to keep track of how long each part of the pipeline takes to run

Allow user to supply labels

Allow user to supply labels to attach to nodes

make_with_source(
  source = c("Data prep" = "1 data_prep.R", 
  targets = c("Renamed/recoded data" = "data/1 dat.Rds"), 
  dependencies = c("Survey data" =  "scratch/data/0_raw_data.csv", "Postcode concordance" = "scratch/lookup/concordance.csv"),
)

Additional arguments to save/show_pipeline()

Need to allow user to pass through arguments to visNetwork so they can adjust size, title, etc.

Implement a Segment class to serve as the basic building block for the Pipeline. This will clarify the link between the fundamental make_*() functions and the Pipeline object and should make it easier to implement #3, #24, etc.

What should `make_with_*()` return?

The options are:

NULL invisibly
TRUE/FALSE invisibly depending on whether or not the targets were up-to-date
NULL or the result of executing the recipe. Note that this only applies to make_with_recipe, since whether the result(s) of sourcing the source are attached to the global environment depends on the arguments passed through to source()

Make dependencies optional

We may want to begin with a script that pulls data from a remote source. It should be straight forward to make the dependency argument optional

0.2.0

Submit to CRAN:

devtools::submit_cran()
Approve email
Action feedback and resubmit if necessary

Wait for CRAN...

Accepted 🎉
usethis::use_github_release()
usethis::use_dev_version()
Submit to R Weekly: https://github.com/rweekly/rweekly.org/blob/gh-pages/draft.md

0.2.1

Finalise

Bump version
Update NEWS

Submit to CRAN:

devtools::submit_cran()
Approve email
Action feedback and resubmit if necessary

Wait for CRAN...

Accepted 🎉
usethis::use_github_release()
usethis::use_dev_version()
Submit to R Weekly: https://github.com/rweekly/rweekly.org/blob/gh-pages/draft.md

Expand tests

A large part of the package is devoted to pipeline visualisations, which are challenging to test. But I need to find a way to boost the test coverage up to at least 80%

Add tests argument to make functions

It'd be cool to have a test argument to the make functions that can either be an expression or a file/directory path, that is run at the end to validate the make.

Depending on the type:

An expression is evaluated as the test set
A file or directory path is evaluated using test_dir() or test_file() as appropriate

Would need to think about how to parse test results while also allowing maximum flexibility, since different testing approaches would return different things (e.g. test_dir/test_file return the test results, an expression block with a test_that call in it would return either NULL or throw an error).

For e.g.

make_with_source(
  dependencies = c("data/0_raw_data.csv", "lookup/concordance.csv"),
  source = c("1 data_prep.R"),
  targets = c("data/1_data.Rds"),
  tests = "tests/test-1-data.R"
)

Rename package

Unfortunately piper is taken. makepipe is the next best alternative

Don't fail if `dependencies` don't exist

Currently make_*() will throw an error if a supplied dependency doesn't exist.

However, there are certain scripts one might wish to run which gracefully take care of missing dependencies (e.g. they check whether the dependency exists and, if not, set up a dummy version within script).

It might be preferable to leave it to the underlying script/code to throw the error

cc: @paddytobias

Add `nomnoml` support

The flowcharts made by visNetwork are nice and interactive, but they become cluttered pretty quickly and they can't be easily exported as, e.g., png. I think using nomnoml by default may alleviate these issues. https://github.com/rstudio/nomnoml

Should move visNetwork to suggests

Add pipeline initialisation helper

As a first approximation, we expect the dependencies to be anything read in with readRDS(), read_*() and read.*() and the targets to be anything written out with saveRDS, write_*() or write.*()

So it should be possible to do static analysis of the scripts in a directory to pull out the likely dependencies and targets and produce a pipeline consisting of multiple make_with blocks.

Getting started vignette

Should write a vignette which highlights:

Conditional execution logic
Default execution environment
Return values

Alternatively, should build out the relevant section in the README.

Add option to force a `make_*()` segment to run

We might want to run a particular step of the pipeline no matter what. For example, we might want to refresh a particular file from an external source

Pipeline labels aren't being applied

It's because they're overwritten by the print method:

makepipe/R/Pipeline.R

Lines 220 to 225 in cc4f700

    
           # Label 
        
           nodes$label <- ifelse( 
        
             nodes$id %in% edges[edges$.recipe, "to"], 
        
             "Recipe", 
        
             basename(as.character(nodes$id)) 
        
           )

Should only do this if there are no labels to begin with

Bug in pipeline

Out-of-dateness still not flowing through the pipeline properly despite 7a6a7cc.

Pipeline segments were being invalidated if the source file was newer than the dependencies. Need to prevent that from occurring.

Targets are out-of-date if and only if they are older than their dependencies, or their dependencies are older than their dependencies, and so on.

Store label/note in segment

Currently these are stored in the pipeline$nodes object but they should really go into the segment object so that they can get printed along with the other information

Make registration fails if quiet=TRUE

I must've been asleep at the wheel when I wrote this control flow. It doesn't make any sense

makepipe/R/make_register.R

Lines 38 to 47 in a605957

    
           if (!quiet) { 
        
             msg <- "`make_register()` called outside of a 'makepipe' pipeline" 
        
             if (exists("__makepipe_register__", parent.frame())) { 
        
               register_env <- get("__makepipe_register__", parent.frame()) 
        
               if (!is.environment(register_env)) warning(msg, call. = FALSE) 
        
             } else { 
        
               warning(msg, call. = FALSE) 
        
               return(invisible(value)) 
        
             } 
        
           }

0.1.0

Submit to CRAN:

devtools::submit_cran()
Approve email
Action feedback and resubmit if necessary

Wait for CRAN...

Accepted 🎉
usethis::use_github_release()
usethis::use_dev_version()
Submit to R Weekly: https://github.com/rweekly/rweekly.org/blob/gh-pages/draft.md

Use roxygen-like tags to declare dependencies

At the top of each script, something like

#' @title Merge
#' @description Merge population variables into survey data.
#' @dependencies [ "data/2 dat.Rds", "data/0 pop.Rds"]
#' @targets "data/3 dat.Rds"
NULL

Then something like make_with_dir() to either generate an R script containing make_with_*() blocks or else to simply execute the pipeline

Implement clean and rebuild functions

Once the pipeline has been executed, it stores information on the locations of all targets, dependencies and source files on disk. It also stores each recipe as a string.

We can leverage this to implement clean and rebuild functions:

clean_pipeline() would delete all targets
build_pipeline() would execute make_with_source() or make_with_recipe() for each source/recipe in the pipeline in the appropriate order
rebuild_pipeline() would call clean_pipeline() and then build_pipeline()

The most challenging part of this will be figuring out what the 'appropriate order' of execution is. GNU make decides where to start using topological sorting. Possibly there's an algorithm in visNetwork that I can use. Otherwise, I'll need to code something up.

Once this is done, it might be neat to introduce a pipeline() function for defining a pipeline in one fell-swoop

my_pipeline <- Pipeline$new(
  list(
    dependencies = c("data/0_raw_data.csv", "lookup/concordance.csv"),
    source = c("1 data_prep.R"),
    targets = c("data/1_data.Rds")
  ),
  list(
      dependencies = c("data/1_data.Rds", "data/0_pop.Rds"),
      recipe = {
        dat <- readRDS("data/raw_data.Rds")
        pop <- readRDS("data/pop_data.Rds")
        merged_dat <- merge(dat, pop, by = "id")
        saveRDS(merged_dat, "data/2_data.Rds")
      },
      targets = c("data/2_data.Rds")
  )
)

rebuild_pipeline(my_pipeline)

Update minimal example

https://github.com/kinto-b/makepipe_example

Use make_with_dir instead?
Display pipeline as text

Allow user to add annotations in `make_*()`

Something like:

make_with_source(
  desc = "Load and clean raw data",
  dependencies = c("data/0_raw_data.csv", "lookup/concordance.csv"),
  source = "1 data_prep.R",
  targets = "data/1_data.Rds"
)

`make_with_recipe`: Error in basename(as.character(nodes$id)): path too long

It looks like make_with_recipe runs into an error when the recipe has more than a few lines of code

library(makepipe)
#> Warning: package 'makepipe' was built under R version 4.0.5
saveRDS(cars, 'raw_dat.rds')
make_with_recipe(
  recipe = {
    new_cars <- cars
    new_cars$a <- 1
    new_cars$b <- 2
    new_cars$c <- 3
    new_cars$d <- 4
    new_cars$e <- 5
    new_cars$f <- 6
    new_cars$g <- 7
    new_cars$h <- 8
    new_cars$i <- 9
    new_cars$i <- 10
   saveRDS(new_cars, 'processed_dat.rds')
  },
  
  dependencies = 'raw_dat.rds',
  targets = 'processed_dat.rds'
  
)
#> Error in basename(as.character(nodes$id)): path too long

^{Created on 2022-05-11 by the reprex package (v1.0.0)}

Seems like it may have something to do with the makepipe visualisation?

Text summary

Would it be possible to provide a text summary of the pipeline as an alternative to the flowchart?

For example a make_with_source() block could be rendered,

**Merge - `3 merge.R`**
Merge population variables into survey data.

- Dependencies: `data/2 dat.Rds`, `data/0 pop.Rds`
- Targets: `data/3 dat.Rds`

Add `envir` argument to `make_with_source()`

. It doesn't make sense for make_with_source() to evaluate in the global by default while make_with_recipe() evaluates in the parent.frame().

Suggested by @gorcha in #2 (comment)_

Implement `make_return()`

As per #2, we want a formal way to specify return values that unifies make_*() variants.

Something like this should work:

make_return <- function(x) {
    signalCondition(rlang::cnd("makepipe_return_cnd", res = x))
    invisible(x)
}

make_with_source <- function(...) {
  ...
  out <- tryCatch(
    source(...),
    makepipe_return_cnd = function(x) {
        x$res
    }
  )
 ... 
 
  out
}

Add pkg option to `make`

Some pipelines will need to be re-run if a key package has been updated.

Mechanism will be same as with other dependencies: find package location and compare modification times

pkg_locs <- paste0(find.package(pkgs), "/DESCRIPTION")
out_of_date(target, pkg_locs)

Will need to think about how to incorporate into the pipeline viz

Submit to CRAN

First release:

#11
usethis::use_cran_comments()
Proof read Title: and Description:
Check that all exported functions have @returns and @examples
Check that Authors@R: includes a copyright holder (role 'cph')
Check licensing of included files
Review https://github.com/DavisVaughan/extrachecks
Review checklist here: https://www.mzes.uni-mannheim.de/socialsciencedatalab/article/r-package/#section6
#12

Prepare for release:

devtools::build_readme()
urlchecker::url_check()
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
rhub::check_for_cran()

Submit to CRAN:

devtools::submit_cran()
Approve email
Action feedback:
- Add some more details about the package functionality and implemented methods in the Description text.
- Replace call to installed.packages() with find.package()
Resubmit

Wait for CRAN...

Accepted 🎉
usethis::use_github_release()
usethis::use_dev_version()
Update install instructions in README
Once accepted, submit to R Weekly: https://github.com/rweekly/rweekly.org/blob/gh-pages/draft.md

	# Label
	nodes$label <- ifelse(
	nodes$id %in% edges[edges$.recipe, "to"],
	"Recipe",
	basename(as.character(nodes$id))
	)

	if (!quiet) {
	msg <- "`make_register()` called outside of a 'makepipe' pipeline"
	if (exists("__makepipe_register__", parent.frame())) {
	register_env <- get("__makepipe_register__", parent.frame())
	if (!is.environment(register_env)) warning(msg, call. = FALSE)
	} else {
	warning(msg, call. = FALSE)
	return(invisible(value))
	}
	}

kinto-b / makepipe Goto Github PK

makepipe's People

Contributors

Stargazers

Watchers

makepipe's Issues

Recommend Projects

Recommend Topics

Recommend Org