aplbrain / npyjs Goto Github PK

View Code? Open in Web Editor NEW

72.0 9.0 20.0 15.43 MB

Read numpy .npy files in JavaScript

Home Page: https://aplbrain.github.io/npyjs/

License: Apache License 2.0

JavaScript 91.86% Python 8.14%

npy npy-files javascript numpy nodejs jhuapl 3d

npyjs's People

Contributors

Stargazers

Watchers

npyjs's Issues

Feature: Save() function for TypedArrays and regular arrays

Title says it all.

Would it be possible to include a save() function for TypedArrays and regular arrays? For both NodeJS filesystem as in-browser download? This way I can export my javascript arrays to python/numpy.

This is extremely cool, please make it work with floats and implement some unit tests. <eom>

Functionality to allow ArrayBuffer inputs along with file path

Hey, @j6k4m8 I've used your package in a few packages and I think it is a life-saver.

I faced an issue when I wanted to parse and open a .npy file from and ArrayBuffer in TypeScript.
I identified that the load function internally creates an ArrayBuffer which stores the result of the fetch request.

It would be really helpful if we could bypass this part of the code when the passed argument is already an ArrayBuffer.

Although I am not as experienced as my peers, I'm submitting a PL as it might be a useful addition.

Bring npm package up to date with Github repo

The current version of the npyjs npm package seems to be behind the Github repo version, updating the npm package to match the repo version would be helpful.

You could consider automating the process of publishing to npm through Github actions (e.g. when pushing code or when accepting pull requests).

TypeError: Only absolute URLs are supported

how to convert dtype: 'float64',
data: Float64Array(256) into Array

File failing to load.

When trying to open a file I get the following. Tried with both the local and absolute paths as well as with file:/// and without.

TypeError: Only absolute URLs are supported
    at getNodeRequestOption

Various issues in code

Hello -- some things I noticed in the code which I think may be worth mentioning.

Errors for headers larger than 255 bytes: In parse(), when reading the header length, a uint8 is being read from the DataView. It should actually be a uint16 in little-endian ordering. Existing code does not work for files with headers >255 bytes. Not a common situation but one that I've run into, especially when logging arrays of structures with named fields.

Fix would be replacing getUint8( offset ) with getUint16( offset, true ), where true indicates little-endian ordering.

Errors when dtype description has more than one "(": When translating the header content from 'Python dict' format to 'JSON' format, there are three calls to String.replace() to substitute single quotes for double quotes, and, parentheses to square brackets. I'm guessing this is for py-tuple to js-array conversion. Two of the calls use regular expressions as the match argument, and one uses a string. Unfortunately, String.replace() does not behave consistently for these inputs. When the argument is a regular expression, String.replace() replaces all instances of the match argument. When the argument is a string, it replaces only the first. In this case, what happens is that ALL ")" characters are replaced with "]", but, only the FIRST "(" is converted to a "[". I suspect this is not intended.
Errors for data segments not aligned to word boundaries: Various arrayConstructor function values expect data samples to be aligned to 4-byte word boundaries ("<i4", for example). When this is not the case, an error is thrown. If you want to read the data into arrays using these constructor functions, to accommodate situations where the first data address is not at a word boundary, the data should first be sliced into a new ArrayBuffer before being passed into the constructor.

Browser Version

This works great in the browser without the fetch import. I had to a do a little fiddling to figure that out and it looks @Fil did too.

Have you considered publishing the decoding function alone so this library can be used directly in the browser?

Int16 dtype

Thanks for your code! I found, however, that Int16 files were not supported. This can be fixed easily enough by including the following dtype

"<i2": {
name: "int16",
size: 16,
arrayConstructor: Int16Array,
},

Thanks

How to load a local npy file using npyjs in React?

Loading a 2D array does not seem to work despite the code compiling. In the console I get the error -

Uncaught (in promise) SyntaxError: JSON.parse: unexpected character at line 1 column 1 of the JSON data

The code that I have is -

import ndarray from "ndarray";
import npyjs from "npyjs";

const NP = function(){

let n = new npyjs();

n.load("./embeddings.npy").then(res => {
    // res has { data, shape, dtype } members.
    const npyArray = ndarray(res.data, res.shape);
    console.log(npyArray);
});
}
export default NP;

If I replace the file path with https://rawcdn.githack.com/aplbrain/npyjs/ba60a3a529f3210dd07d2ed05ab628939e18b6a7/test/data/4x4x4x4x4-float32.npy then it seems to work...is there a way to load from local file paths?

Does not work with structured arrays

.npy files can contain structured arrays, as described here, fail to open with error:

Uncaught SyntaxError: Unexpected token ( in JSON at position 28

Example, in python create the structured array:

np.save('test/out.npy', np.array([('Rex', 9, 81.0), ('Fido', 3, 27.0)],
             dtype=[('name', 'U10'), ('age', 'i4'), ('weight', 'f4')]))

Serve the out.npy file:

cd test
npx serve

Try to load that out.npy file:

> np.load("http://localhost:3000/out.npy")
Promise {
  <pending>,
  [Symbol(async_id_symbol)]: 4454,
  [Symbol(trigger_async_id_symbol)]: 5
}
> Uncaught SyntaxError: Unexpected token ( in JSON at position 28

Allow passing arbitrary fetch args

Support for all numpy dtypes

Numpy supports the following dtypes as per the docs. However npyjs only supports the following:

character	description	supported
'?'	boolean	NO
'b'	(signed) byte	NO
'B'	unsigned byte	NO
'i'	(signed) integer	YES
'u'	unsigned integer	YES
'f'	floating-point	YES
'c'	complex-floating point	NO
'm'	timedelta	NO
'M'	datetime	NO
'O'	(Python) objects	N/A
'S', 'a'	zero-terminated bytes (not recommended)	NO
'U'	Unicode string	NO
'V'	raw data (void)	NO

Would be great to add these in.

Throw Errors from Load Function

Looking to use this library as I'll be loading in npy files. I was looking at the source code and noticed that you're catching potential errors in the load function and just printing them to console.error. I would like to suggest that those errors are allowed to propagate out of the function, so the developer can handle them instead of having them go to the console.

The callback might be an issue. My ideas on that would be to either add a second errorCallback, or a second error argument to the first callback. I'd be happy to put together a PR as well if that helps!

Edit: I should clarify, the second argument method I think should follow the error-first callback schema, so the result would then be in the second argument. This would be a change in the way the module works, and would possibly warrant a minor version bump if accepted. The errorCallback method would not change existing functionality as it could simply be omitted, which may be preferable.

dtype '|u1'?

Someone sent me a npy with '|u1', which results in an error. I had to add it in dtypes so that it would parse.

hexdump -C

00000000  93 4e 55 4d 50 59 01 00  76 00 7b 27 64 65 73 63  |.NUMPY..v.{'desc|
00000010  72 27 3a 20 27 7c 75 31  27 2c 20 27 66 6f 72 74  |r': '|u1', 'fort|
00000020  72 61 6e 5f 6f 72 64 65  72 27 3a 20 46 61 6c 73  |ran_order': Fals|
00000030  65 2c 20 27 73 68 61 70  65 27 3a 20 28 34 39 39  |e, 'shape': (499|
00000040  35 30 30 2c 29 2c 20 7d  20 20 20 20 20 20 20 20  |500,), }        |

Raw access for float16

Hi,

I am using your library for fast prototyping.

It was very convenient to use but i had to patch it to read float16.

float16 is not supported by js, but is supported by webgl. So this allow to fetch the data as an opaque manner, the shape remains the same, and the result is anyway typed with dtype==='float16'.

I also use some wrapper to read float16 values in js, but i think it is not really needed at this stage.

Anyway, here is setup.

import _npyjs from 'npyjs';

const npyjs = new _npyjs();

// Supports float16 as uint16
npyjs.dtypes['<f2'] = {
  name: 'float16',
  size: 16,
  arrayConstructor: Uint16Array,
};

Tell me if you prefer to have a PR.
Regards,

Stating changes in accordance with Apache License

/*
 * Copyright 2023 aplbrain/npyjs
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

/*
 * This file includes code adapted from the npyjs project,
 * which is licensed under the Apache License, Version 2.0.
 * The original code can be found at: https://github.com/aplbrain/npyjs/blob/master/index.js
 *
 * Modifications:
 * Added U1 support and fixed U1 -> u1 bug, added fortran comment and made a functional interface
 */

const dTypeMapping: Record<
  string,
  // "<u1" | "|u1" | "<u2" | "|i1" | "<i2" | "<u4" | "<i4" | "<u8" | "<i8" | "<f4" | "<f8" | "<U1",
  {
    name: string;
    size: number;
    arrayConstructor:
      | Uint8ArrayConstructor
      | Uint16ArrayConstructor
      | Int8ArrayConstructor
      | Int16ArrayConstructor
      | Int32ArrayConstructor
      | BigUint64ArrayConstructor
      | BigInt64ArrayConstructor
      | Float32ArrayConstructor
      | Float64ArrayConstructor
      | Uint32ArrayConstructor;
  }
> = {
  "<u1": {
    name: "uint8",
    size: 8,
    arrayConstructor: Uint8Array,
  },
  "|u1": {
    name: "uint8",
    size: 8,
    arrayConstructor: Uint8Array,
  },
  "<u2": {
    name: "uint16",
    size: 16,
    arrayConstructor: Uint16Array,
  },
  "|i1": {
    name: "int8",
    size: 8,
    arrayConstructor: Int8Array,
  },
  "<i2": {
    name: "int16",
    size: 16,
    arrayConstructor: Int16Array,
  },
  "<u4": {
    name: "uint32",
    size: 32,
    arrayConstructor: Int32Array,
  },
  "<i4": {
    name: "int32",
    size: 32,
    arrayConstructor: Int32Array,
  },
  "<u8": {
    name: "uint64",
    size: 64,
    arrayConstructor: BigUint64Array,
  },
  "<i8": {
    name: "int64",
    size: 64,
    arrayConstructor: BigInt64Array,
  },
  "<f4": {
    name: "float32",
    size: 32,
    arrayConstructor: Float32Array,
  },
  "<f8": {
    name: "float64",
    size: 64,
    arrayConstructor: Float64Array,
  },
  "<U1": {
    name: "<U1", // no way to know when to use ucs2 vs ucs4
    size: 32,
    arrayConstructor: Uint32Array,
  },
};

export const parseNpy = (arrayBufferContents: ArrayBuffer) => {
  // const version = arrayBufferContents.slice(6, 8); // Uint8-encoded
  const headerLength = new DataView(arrayBufferContents.slice(8, 10)).getUint8(
    0
  );
  const offsetBytes = 10 + headerLength;

  const hcontents = new TextDecoder("utf-8").decode(
    new Uint8Array(arrayBufferContents.slice(10, 10 + headerLength))
  );
  const header = JSON.parse(
    hcontents
      // .toLowerCase() // True -> true
      .replace(/True/g, "true")
      .replace(/False/g, "false")
      .replace(/'/g, '"')
      .replace("(", "[")
      .replace(/,*\),*/g, "]")
  );
  const shape = header.shape;
  const dtype = dTypeMapping[header.descr];
  const nums = new dtype["arrayConstructor"](arrayBufferContents, offsetBytes);

  // if fortran_order:
  //     array.shape = shape[::-1]
  //     array = array.transpose()

  return {
    dtype: dtype.name,
    data: nums,
    shape,
    fortranOrder: header.fortran_order,
  };
};

Release including types

Thanks for this awesome package!

The current release doesn't include any of the types at found here. A quick release that includes these types would be great. Currently we just copy-pasted that file into our project.

TypeScript Support

I'm using the code from this library in a TypeScript/ Deno environment. It'd be nice if there was a possible TS implementation. If not, I can take a shot at it! I have a rough, typed implementation. Let me know what you think and how I can test/ help implement this.

aplbrain / npyjs Goto Github PK

npyjs's People

Contributors

Stargazers

Watchers

Forkers

npyjs's Issues

Recommend Projects

Recommend Topics

Recommend Org