Code Monkey home page Code Monkey logo

marshal's Issues

DESIGN: Save package (versions) instead of unmarshal function

I think it might be an idea to replace the default marshal() implementation by not storing the unmarshal function in the marshalled object, but instead by saving the required package (versions).
I.e. I am talking about this line here: https://github.com/HenrikBengtsson/marshal/blob/72e976c61f00861ceb0b4c0c0fd7016b4eee417a/R/marshal.default.R#L30

This would mean, that instead of only having to implement the marshal.<class> method, one would have to implement.
marshal.<class> and unmarshal.<class>. The unmarshal generic could then verify that the packages required to unmarshal the object are loaded (including the package that contains the unmarshal generic) before dispatching onto the method.

The advantages of this approach would be:

  • memory efficiency: saving the required package (versions) uses less space then saving the unmarshal function. Especially with the --with-keep.source option, the size of marshaled objects can explode with the current approach (admittedly, one could also just remove the srcrefs manually to address the latter problem). This is somewhat reminicent of the R6 problems we were facing in mlr3 and our workaround now is similar to what I am suggesting here, i.e. not store the methods alongside the objects but in the package.
  • One could also think about even storing the package versions to give even better error messages and make this part of the standard (e.g. a compatibility matrix is stored alongside the package implementing the (un)marshal method and can be consulted when calling (un)marshal. Here we would have to take care what happens when reading an object written by package version A with package version B, where B < A, as package with version B does not know whether its format can be read by package with version A, as it did not know of its existence when it was released.

The disadvantages would be:

  • The package that implements the (un)marshal methods must be loaded, which is not the case right now.
    However, if this was to become a standard, the package that implements (un)marshal should usually also be the package that implements the actual functionality to do the marshaling.
  • In the other approach, the unmarshal function is ensured to be the same that was used to marshal the object.

Package 'marshal': serialize objects

https://en.m.wikipedia.org/wiki/Marshalling_(computer_science)

This package should provide an API for serializing and un-serializing R objects.

It should use S3 generic functions so it can be extended and customized downstream.

It could also have plug-in support for different types of serialization protocols.

Marshalling and de-marshalling comes before serialization.

The API should also provide methods that compared objects to known accept and reject lists. It sounds be possible to update these lists too.

This package should also provide methods to scan for references such as external pointers.

tibble: A `tbl` may contain an external pointer via attribute `problems` set by readr

A tbl may contain an external pointer via attribute problems, e.g.

spc_tbl_ [25,000 ร— 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ Index        : num [1:25000] 1 2 3 4 5 6 7 8 9 10 ...
 $ Height_Inches: num [1:25000] 65.8 71.5 69.4 68.2 67.8 ...
 $ Weight_Pounds: num [1:25000] 113 136 153 142 144 ...
 - attr(*, "spec")=
  .. cols(
  ..   Index = col_double(),
  ..   Height_Inches = col_double(),
  ..   Weight_Pounds = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 

Add can_be_marshalled()

Add a function, can_be_marshalled(), that can be used to test if an object can be marshalled or not.

DESIGN: `clone` / `inplace` argument

When marshaling objects that have reference semantics, having a clone / inplace parameter for the (un)marshal generics might be handy.

The pseudocode below illustrates a call to marshal(), where cloning is not necessary and another call to unmarshal() where it is necessary.

g <- function() {
  x_marshaled <- callr::r({
    x <- f(...)
    marshal(x, clone = FALSE)
  })
  x_unmarshaled <- unmarshal(x_marshaled, clone = TRUE)
  y <- h(x_unmarshaled)
  return(list(x_marshaled, y))
}

To stay on the safe side, marshal methods for objects with reference semantics should always clone by default and not modify th object that is being marshaled in-place. Because marshal() if often called right before sending the object to another process, it might be worth to optimize the special case where in-place modifications are allowed (or in general, the object that is being marshaled is not being further used).

DESIGN: global state dict

When marshaling container objects I think it is important that a global hashmap is initialized by the top-level marshal() call, which can be used to preserve reference identities, i.e. by book-keeping which objects were already marshaled.

The code below (hopefully) illustrates this problem, which is a major challenge for R6.

library(marshal)

# this is our custom environment class generator for which we want to implement a custom marshaler
custom_env = function(data)  {
  e = new.env()
  e$data = data
  e$other_fn = function() {
    print("do some stuff")
  }
  class(e) = "custom_env"
  return(e)
}

# because the `$other_fn` can simply be re-created, we don't want to marshal it, i.e. we only have to marshal the `data` field.
registerS3method("marshal", "custom_env", function(x, ...) {
  structure(list(marshaled = x$data), class = c("custom_env_marshaled", "marshaled"))
})
registerS3method("unmarshal", "custom_env_marshaled", function(x, ...) {
  custom_env(x$marshaled)
})

# a problem then arises when the marshal method for container objects redirects work by calling marshal on its
# contents.
container = function(...) {
  structure(list(...), class = "container")
}

# Below, `marshal()` is applied to the same (identical) environment twice
registerS3method("marshal", "container", function(x, ...) {
  structure(list(marshaled = lapply(x, marshal)), class = c("container_marshaled", "marshaled"))
})

registerS3method("unmarshal", "container_marshaled", function(x, ...) {
  do.call(container, args = lapply(x[[1L]], unmarshal))
})

ce = custom_env(1)

cont = structure(list(a = ce, b = ce), class = "container")

cont_rec = unmarshal(marshal(cont))
identical(cont[[1]], cont[[2]])
#> [1] TRUE
identical(cont_rec[[1]], cont_rec[[2]])
#> [1] FALSE

Created on 2024-03-15 with reprex v2.0.2

Plan for CRAN release

Thanks for the work on this useful package! We are currently implementing something similar for the mlr3 ecosystem. Is there a plan for when this package will make it to CRAN so we could potentially build upon it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.