Code Monkey home page Code Monkey logo

Comments (5)

bmaranville avatar bmaranville commented on May 21, 2024 1

The jsfive and h5wasm packages have different approaches for opening files: jsfive is built around working with ArrayBuffer objects (including internally), while h5wasm is built on the HDF5 C API and uses a virtual filesystem (the native filesystem for nodejs, and a virtual MEMFS filesystem in the browser).

There are directions on loading an hdf file from an ArrayBuffer in the h5wasm README... you have to "save" it to the virtual filesystem first:

let response = await fetch("https://ncnr.nist.gov/pub/ncnrdata/vsans/202003/24845/data/sans59510.nxs.ngv");
let ab = await response.arrayBuffer();

hdf5.FS.writeFile("sans59510.nxs.ngv", new Uint8Array(ab));

// use mode "r" for reading.  All modes can be found in hdf5.ACCESS_MODES
let f = new hdf5.File("sans59510.nxs.ngv", "r");
// File {path: "/", file_id: 72057594037927936n, filename: "data.h5", mode: "r"}

The constructor for h5wasm could be modified so that it automatically creates a backing file if an ArrayBuffer is passed as the first argument, and closes it when the File object is closed (and then deletes it?). Alternatively I could look into exposing the HDF5 API function H5LTopen_file_image for loading a file image directly from memory.

from h5wasm.

bmaranville avatar bmaranville commented on May 21, 2024 1

The short answer is yes, you can load slices efficiently without loading the entire file into memory, but only if you use the nodejs version of h5wasm that directly accesses the hdf5 file from the filesystem combined with the Dataset.slice function.

If you use the browser version, it by necessity loads the entire file into memory first, though you will see performance benefits from using Dataset.slice() in this case also, as you don't have to decode the entire dataset before using parts of it. For e.g. Compound datatypes that are expensive to decode this could be important.

There is another ticket #4 where a request was made for random access to files over a network - this is not easy to implement and may be done in the future if the next version of the emscripten filesystem supports this directly.

from h5wasm.

alexpreynolds avatar alexpreynolds commented on May 21, 2024

Thanks, and sorry for my confusion about the API.

Can I ask briefly if the entire container is loaded into memory before any processing can be done? I'd like to know if I can progressively load chunks or slices of a data matrix, if I have a larger container.

from h5wasm.

alexpreynolds avatar alexpreynolds commented on May 21, 2024

Thanks for your help.

I think I am having trouble opening compound data, both in the browser and in the nodejs version (v0.1.8):

$ node
Welcome to Node.js v16.13.1.
Type ".help" for more information.
> const hdf5 = require('h5wasm')
> let f = new hdf5.File("/Users/areynolds/Desktop/data.h5", "r")
> let t = f.get('data/tsg8n0ki')

When I print out the first row:

> t.slice([[0,1]])
[
  [
    Uint8Array(12) [
      231,  27, 202,  64, 195,
      182, 103,  64, 139,  50,
      161,  64
    ],
    0
  ]
]

In reality, this row is a compound of three 32-bit floats (np.float32) and one unsigned 32-bit integer (np.uint32). From the Python script used to make data.h5:

ds_dtype = [('xyz', np.float32, (3, )), ('label_idx', np.uint32)]

The size is right — twelve bytes gives three 32-bit floats — but I'm getting a raw array of those bytes and not the original floats.

Am I doing something wrong to access this data, or would it help to open a new issue? I should probably close this one up, as well.

from h5wasm.

bmaranville avatar bmaranville commented on May 21, 2024

I don't have good version notes at the moment - but decoding compound datasets should work in h5wasm >= 0.1.8 (this was a recent feature addition)

from h5wasm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.