henrikbengtsson / marshal Goto Github PK
View Code? Open in Web Editor NEW[PROTOTYPE] R package: marshal - Framework to Marshal Objects to be Used in Another R Processes
Home Page: https://marshal.futureverse.org/
License: Other
[PROTOTYPE] R package: marshal - Framework to Marshal Objects to be Used in Another R Processes
Home Page: https://marshal.futureverse.org/
License: Other
I think it might be an idea to replace the default marshal()
implementation by not storing the unmarshal
function in the marshalled object, but instead by saving the required package (versions).
I.e. I am talking about this line here: https://github.com/HenrikBengtsson/marshal/blob/72e976c61f00861ceb0b4c0c0fd7016b4eee417a/R/marshal.default.R#L30
This would mean, that instead of only having to implement the marshal.<class>
method, one would have to implement.
marshal.<class>
and unmarshal.<class>
. The unmarshal
generic could then verify that the packages required to unmarshal the object are loaded (including the package that contains the unmarshal generic) before dispatching onto the method.
The advantages of this approach would be:
--with-keep.source
option, the size of marshaled objects can explode with the current approach (admittedly, one could also just remove the srcrefs manually to address the latter problem). This is somewhat reminicent of the R6 problems we were facing in mlr3 and our workaround now is similar to what I am suggesting here, i.e. not store the methods alongside the objects but in the package.The disadvantages would be:
For example, if we detect a connection with index K, we can change its index to -1. If the connection is never used, all is fine, and if it's used it'll trigger a run-time error.
The background is that R does not protect against using connections in other R sessions. This should ideally be fixed in R itself, cf. HenrikBengtsson/Wishlist-for-R#81
https://en.m.wikipedia.org/wiki/Marshalling_(computer_science)
This package should provide an API for serializing and un-serializing R objects.
It should use S3 generic functions so it can be extended and customized downstream.
It could also have plug-in support for different types of serialization protocols.
Marshalling and de-marshalling comes before serialization.
The API should also provide methods that compared objects to known accept and reject lists. It sounds be possible to update these lists too.
This package should also provide methods to scan for references such as external pointers.
A tbl
may contain an external pointer via attribute problems
, e.g.
spc_tbl_ [25,000 ร 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ Index : num [1:25000] 1 2 3 4 5 6 7 8 9 10 ...
$ Height_Inches: num [1:25000] 65.8 71.5 69.4 68.2 67.8 ...
$ Weight_Pounds: num [1:25000] 113 136 153 142 144 ...
- attr(*, "spec")=
.. cols(
.. Index = col_double(),
.. Height_Inches = col_double(),
.. Weight_Pounds = col_double()
.. )
- attr(*, "problems")=<externalptr>
Add a function, can_be_marshalled()
, that can be used to test if an object can be marshalled or not.
When marshaling objects that have reference semantics, having a clone
/ inplace
parameter for the (un)marshal
generics might be handy.
The pseudocode below illustrates a call to marshal()
, where cloning is not necessary and another call to unmarshal()
where it is necessary.
g <- function() {
x_marshaled <- callr::r({
x <- f(...)
marshal(x, clone = FALSE)
})
x_unmarshaled <- unmarshal(x_marshaled, clone = TRUE)
y <- h(x_unmarshaled)
return(list(x_marshaled, y))
}
To stay on the safe side, marshal methods for objects with reference semantics should always clone by default and not modify th object that is being marshaled in-place. Because marshal()
if often called right before sending the object to another process, it might be worth to optimize the special case where in-place modifications are allowed (or in general, the object that is being marshaled is not being further used).
From https://future.futureverse.org/articles/future-4-non-exportable-objects.html#package-rcpp:
Rcpp::sourceCpp(code = "
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
int my_length(NumericVector x) {
return x.size();
}
")
This produces:
my_length()
which holds an external reference pointer.
Can such an object be marshalled?
When marshaling container objects I think it is important that a global hashmap is initialized by the top-level marshal()
call, which can be used to preserve reference identities, i.e. by book-keeping which objects were already marshaled.
The code below (hopefully) illustrates this problem, which is a major challenge for R6
.
library(marshal)
# this is our custom environment class generator for which we want to implement a custom marshaler
custom_env = function(data) {
e = new.env()
e$data = data
e$other_fn = function() {
print("do some stuff")
}
class(e) = "custom_env"
return(e)
}
# because the `$other_fn` can simply be re-created, we don't want to marshal it, i.e. we only have to marshal the `data` field.
registerS3method("marshal", "custom_env", function(x, ...) {
structure(list(marshaled = x$data), class = c("custom_env_marshaled", "marshaled"))
})
registerS3method("unmarshal", "custom_env_marshaled", function(x, ...) {
custom_env(x$marshaled)
})
# a problem then arises when the marshal method for container objects redirects work by calling marshal on its
# contents.
container = function(...) {
structure(list(...), class = "container")
}
# Below, `marshal()` is applied to the same (identical) environment twice
registerS3method("marshal", "container", function(x, ...) {
structure(list(marshaled = lapply(x, marshal)), class = c("container_marshaled", "marshaled"))
})
registerS3method("unmarshal", "container_marshaled", function(x, ...) {
do.call(container, args = lapply(x[[1L]], unmarshal))
})
ce = custom_env(1)
cont = structure(list(a = ce, b = ce), class = "container")
cont_rec = unmarshal(marshal(cont))
identical(cont[[1]], cont[[2]])
#> [1] TRUE
identical(cont_rec[[1]], cont_rec[[2]])
#> [1] FALSE
Created on 2024-03-15 with reprex v2.0.2
The bundle package has bundle()
and unbundle()
for marshalling and unmarshalling of model-based objects, cf. https://github.com/rstudio/bundle/tree/main/R. Implementing marshal()
and unmarshal()
S3 methods for those will be a good case study and add more test cases.
/ht @dfalbel /cc @simonpcouch @juliasilge
I've added vignette https://marshal.futureverse.org/articles/known_cases.html that currently only holds a table. The data from that table is pulled from https://github.com/HenrikBengtsson/marshal/blob/develop/inst/known_cases.json.
This is a first attempt at providing an overview of what happens when "exporting" different types of objects in R. I anticipate that this table will keep growing as we get aware of more cases.
Thanks for the work on this useful package! We are currently implementing something similar for the mlr3 ecosystem. Is there a plan for when this package will make it to CRAN so we could potentially build upon it?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.