Code Monkey home page Code Monkey logo

Comments (8)

yongtang avatar yongtang commented on June 3, 2024 1

@areeh PR #407 has been opened which should support your case. The PR allows to read a local numpy through the same address space.

It may have limitations but if your process is local then performance might be good for large numpy array (as there is no serialization overhead before hand).

The dict/tuple of the features has been added as well.

from io.

BryanCutler avatar BryanCutler commented on June 3, 2024

@yongtang Arrow supports reading numpy arrays to record batches. So I don't think it would be much effort to add this, but there would be a limit on dimensionality - for now at least. Hopefully that will change in the future.

from io.

yongtang avatar yongtang commented on June 3, 2024

@BryanCutler I think there are two issues, one is the conversion between Apache Arrow and Numpy in memory, another is to read data from npy or npz file format. I haven't find out a way to open npy or npz from Arrow library. However, it seems like the npy or npz file format are very much straightforward, a simple parser should be enough. I will look into it and see if I could come up with something quick.

from io.

BryanCutler avatar BryanCutler commented on June 3, 2024

Actually the conversion from Numpy to Arrow is zero copy, so it wouldn't consume any more memory, but you're right, Arrow doesn't support reading these files. If you're able to read them directly in the op that would be cool!

from io.

areeh avatar areeh commented on June 3, 2024

Performant solutions using tf.data and disk are quite painful right now, so this would be a welcome addition. I just want to mention that it's particularly useful if you support a case where data looks like:

{ 'feature1': ([...], dtype=np.float32), 'feature2': ([...], dtype=np.int16), }
etc. Somewhat similar to the example from the guide in the split between feature and label, if I'm understanding correctly

from io.

areeh avatar areeh commented on June 3, 2024

@yongtang The PR looks great, it supports everything I had in mind when I wrote the comment. Thank you

from io.

kvignesh1420 avatar kvignesh1420 commented on June 3, 2024

@yongtang can this be closed?

from io.

yongtang avatar yongtang commented on June 3, 2024

@kvignesh1420 Ah yes thanks for the reminder 👍

from io.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.