Comments (8)
@areeh PR #407 has been opened which should support your case. The PR allows to read a local numpy through the same address space.
It may have limitations but if your process is local then performance might be good for large numpy array (as there is no serialization overhead before hand).
The dict/tuple of the features has been added as well.
from io.
@yongtang Arrow supports reading numpy arrays to record batches. So I don't think it would be much effort to add this, but there would be a limit on dimensionality - for now at least. Hopefully that will change in the future.
from io.
@BryanCutler I think there are two issues, one is the conversion between Apache Arrow and Numpy in memory, another is to read data from npy
or npz
file format. I haven't find out a way to open npy
or npz
from Arrow library. However, it seems like the npy
or npz
file format are very much straightforward, a simple parser should be enough. I will look into it and see if I could come up with something quick.
from io.
Actually the conversion from Numpy to Arrow is zero copy, so it wouldn't consume any more memory, but you're right, Arrow doesn't support reading these files. If you're able to read them directly in the op that would be cool!
from io.
Performant solutions using tf.data and disk are quite painful right now, so this would be a welcome addition. I just want to mention that it's particularly useful if you support a case where data looks like:
{ 'feature1': ([...], dtype=np.float32), 'feature2': ([...], dtype=np.int16), }
etc. Somewhat similar to the example from the guide in the split between feature and label, if I'm understanding correctly
from io.
@yongtang The PR looks great, it supports everything I had in mind when I wrote the comment. Thank you
from io.
@yongtang can this be closed?
from io.
@kvignesh1420 Ah yes thanks for the reminder 👍
from io.
Related Issues (20)
- TF 2.14.0 support HOT 2
- Arrow dataset does not support named tensors.
- versions >0.31.0 do not work with TensorFlow versions > 2.10.0 using poetry HOT 2
- Colab: 'tensorflow._api.v2.io' has no attribute 'image' for tf.io.image.decode_dicom_image HOT 1
- [Question] Reading parquet files with numpy arrays in columns as a tfio.IODataset results in error
- Tensorflow 2.15 support HOT 3
- Missing ARM64 wheels for v0.35.0 on PyPI HOT 5
- [v0.35.0] Build Failure on "Analysis of target '@bazel_tools//platforms:windows' failed"
- S3 filesystem pure virtual method called; terminate called without an active exception HOT 12
- DICOM `scale=preserve` not working as intended and performance consideration
- Is the windows support dropped? HOT 1
- Inefficient Write+Copy+Delete pattern when writing to S3. HOT 1
- S3 read throughput slow down after hit prefix limit HOT 1
- extra not provided
- Tensorflow version-pinning should be reflected in setup.py. HOT 1
- Bug in Reading Compressed String Column in Parquet Dataset HOT 2
- Unable to use tensorflow_io.audio.resample on Mac M1
- tensorflow-io-gcs-filesystem==0.36.0 not available via pip HOT 1
- Missing files for Python 3.12 HOT 2
- Tensorflow 2.16 support HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from io.