Code Monkey home page Code Monkey logo

Comments (3)

agoscinski avatar agoscinski commented on August 29, 2024 1

As far as I understood, when pickling an object naively in-band (no external buffer), two copies are involved

  • copy to convert object to byte object
  • copy byte object to pickle stream
    The doc page is a bit intermengled with out-of-band pickling (specifying external buffer) which in addition prevents another copy, but requires the storage of the buffer somewhere.

PickleBuffer to prevent copies:

How numpy implemented it:

Looking at the implementation in the issue, it seems that it is rather trivial, because numpy supports supports the conversion to bytes-like object (PickleBuffer(np.array(5))).

I think they mainly implemented it on the C side and not like the original implementation in the issue on the python side to reduce dependencies.

So instead of using PyPickleBuffer_FromObject you should build pickle buffers by importing the PickleBuffer class from the pickle module and then calling on the array (as you would do in Python). This way in the C code you only manipulate it as a PyObject and do not introduce a build dependency. I am not familiar enough with the C API but I am pretty sure this is possible.

numpy/numpy#11161 (comment)

I am not sure how to translate this into an implementation suggestion for our code. It seems to me that we can just iterate over all arrays in a TensorMap and apply PickleBuffer on it.

class Arrays:
    def __init__(self, a, b):
        self.a = a
        self.b = b

    def __reduce_ex__(self, protocol):
        if protocol >= 5:
            return type(self)._reconstruct, (PickleBuffer(a), PickleBuffer(b), *LOW_MEMORY_META_INFORMATION_LIKE_SHAPE,), None
        else:
            # PickleBuffer is forbidden with pickle protocols <= 4.
            return type(self)._reconstruct, (bytearray(self.a), bytearray(self.b), *LOW_MEMORY_META_INFORMATION_LIKE_SHAPE,)

Haven't had much time to look at how torch deals with this. Found this issue pytorch/pytorch#9168 (comment) that suggests to me that they haven't implemented anything like this, but I just had a quick glance. Converting everything to numpy arrays for pickling and converting it back to the original data type and device while unpickling seems like a valid approach.

from metatensor.

agoscinski avatar agoscinski commented on August 29, 2024
  • step 0: write to a temporary files & read it. This allow us to use pickle for other things in a backward compatible way
  • step 1: write the load function, which should be the easiest
  • step 2: write the save function

Originally posted by @Luthaf in #238 (comment)

For step 1 I am not sure yet how I can connect the existing rust code
https://github.com/lab-cosmo/equistore/blob/724b83e778f77b8d6b9e00e882aad6c035efbd5c/equistore-core/src/io/mod.rs#L52
From the C side I would get a * const char char_ptr that I we need to iterate through. It can get wrapped with CStr.::from_ptr(char_ptr), but then it needs to be wrapped with something to implement the traits Read + Seek so it can be passed to the our io::load function (and there to ZipArchive::new).

This stackoverflow suggested to use Cursor. Need to look at this a bit more.

For the C-side this worked for me

# Python
lb = bytes(l)
ptr = ctypes.cast(lb, ctypes.POINTER(ctypes.c_byte))
reduce_add = lib.reduce_add
reduce_add.argtypes = [ctypes.POINTER(ctypes.c_byte)]
reduce_add.restype = ctypes.c_int
// C
int reduce_add(const char * ptr) {
    ...
    return 0
}

from metatensor.

Luthaf avatar Luthaf commented on August 29, 2024

This stackoverflow suggested to use Cursor. Need to look at this a bit more.

Yes, it sounds like Cursor is the way to go. Instead of CStr I would do a &[u8] though, since we want to manipulate a bunch of bytes, not a string. &[u8] already implements Read, and Cursor<&[u8]> implements Seek.

For the C-side this worked for me

I'm not sure what you mean here

from metatensor.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.