aplbrain / npyjs Goto Github PK
View Code? Open in Web Editor NEWRead numpy .npy files in JavaScript
Home Page: https://aplbrain.github.io/npyjs/
License: Apache License 2.0
Read numpy .npy files in JavaScript
Home Page: https://aplbrain.github.io/npyjs/
License: Apache License 2.0
Title says it all.
Would it be possible to include a save()
function for TypedArrays and regular arrays? For both NodeJS filesystem as in-browser download? This way I can export my javascript arrays to python/numpy.
Hey, @j6k4m8 I've used your package in a few packages and I think it is a life-saver.
I faced an issue when I wanted to parse and open a .npy file from and ArrayBuffer in TypeScript.
I identified that the load function internally creates an ArrayBuffer which stores the result of the fetch request.
It would be really helpful if we could bypass this part of the code when the passed argument is already an ArrayBuffer.
Although I am not as experienced as my peers, I'm submitting a PL as it might be a useful addition.
The current version of the npyjs npm package seems to be behind the Github repo version, updating the npm package to match the repo version would be helpful.
You could consider automating the process of publishing to npm through Github actions (e.g. when pushing code or when accepting pull requests).
how to convert dtype: 'float64',
data: Float64Array(256) into Array
When trying to open a file I get the following. Tried with both the local and absolute paths as well as with file:///
and without.
TypeError: Only absolute URLs are supported
at getNodeRequestOption
Hello -- some things I noticed in the code which I think may be worth mentioning.
Fix would be replacing getUint8( offset ) with getUint16( offset, true ), where true indicates little-endian ordering.
Errors when dtype description has more than one "(": When translating the header content from 'Python dict' format to 'JSON' format, there are three calls to String.replace() to substitute single quotes for double quotes, and, parentheses to square brackets. I'm guessing this is for py-tuple to js-array conversion. Two of the calls use regular expressions as the match argument, and one uses a string. Unfortunately, String.replace() does not behave consistently for these inputs. When the argument is a regular expression, String.replace() replaces all instances of the match argument. When the argument is a string, it replaces only the first. In this case, what happens is that ALL ")" characters are replaced with "]", but, only the FIRST "(" is converted to a "[". I suspect this is not intended.
Errors for data segments not aligned to word boundaries: Various arrayConstructor function values expect data samples to be aligned to 4-byte word boundaries ("<i4", for example). When this is not the case, an error is thrown. If you want to read the data into arrays using these constructor functions, to accommodate situations where the first data address is not at a word boundary, the data should first be sliced into a new ArrayBuffer before being passed into the constructor.
This works great in the browser without the fetch import. I had to a do a little fiddling to figure that out and it looks @Fil did too.
Have you considered publishing the decoding function alone so this library can be used directly in the browser?
Thanks for your code! I found, however, that Int16 files were not supported. This can be fixed easily enough by including the following dtype
"<i2": {
name: "int16",
size: 16,
arrayConstructor: Int16Array,
},
Thanks
Loading a 2D array does not seem to work despite the code compiling. In the console I get the error -
Uncaught (in promise) SyntaxError: JSON.parse: unexpected character at line 1 column 1 of the JSON data
The code that I have is -
import ndarray from "ndarray";
import npyjs from "npyjs";
const NP = function(){
let n = new npyjs();
n.load("./embeddings.npy").then(res => {
// res has { data, shape, dtype } members.
const npyArray = ndarray(res.data, res.shape);
console.log(npyArray);
});
}
export default NP;
If I replace the file path with https://rawcdn.githack.com/aplbrain/npyjs/ba60a3a529f3210dd07d2ed05ab628939e18b6a7/test/data/4x4x4x4x4-float32.npy
then it seems to work...is there a way to load from local file paths?
.npy
files can contain structured arrays, as described here, fail to open with error:
Uncaught SyntaxError: Unexpected token ( in JSON at position 28
Example, in python create the structured array:
np.save('test/out.npy', np.array([('Rex', 9, 81.0), ('Fido', 3, 27.0)],
dtype=[('name', 'U10'), ('age', 'i4'), ('weight', 'f4')]))
Serve the out.npy
file:
cd test
npx serve
Try to load that out.npy
file:
> np.load("http://localhost:3000/out.npy")
Promise {
<pending>,
[Symbol(async_id_symbol)]: 4454,
[Symbol(trigger_async_id_symbol)]: 5
}
> Uncaught SyntaxError: Unexpected token ( in JSON at position 28
Numpy supports the following dtypes as per the docs. However npyjs
only supports the following:
character | description | supported |
---|---|---|
'?' | boolean | NO |
'b' | (signed) byte | NO |
'B' | unsigned byte | NO |
'i' | (signed) integer | YES |
'u' | unsigned integer | YES |
'f' | floating-point | YES |
'c' | complex-floating point | NO |
'm' | timedelta | NO |
'M' | datetime | NO |
'O' | (Python) objects | N/A |
'S',ย 'a' | zero-terminated bytes (not recommended) | NO |
'U' | Unicode string | NO |
'V' | raw data (void) | NO |
Would be great to add these in.
Looking to use this library as I'll be loading in npy files. I was looking at the source code and noticed that you're catching potential errors in the load function and just printing them to console.error
. I would like to suggest that those errors are allowed to propagate out of the function, so the developer can handle them instead of having them go to the console.
The callback might be an issue. My ideas on that would be to either add a second errorCallback, or a second error argument to the first callback. I'd be happy to put together a PR as well if that helps!
Edit: I should clarify, the second argument method I think should follow the error-first callback schema, so the result would then be in the second argument. This would be a change in the way the module works, and would possibly warrant a minor version bump if accepted. The errorCallback method would not change existing functionality as it could simply be omitted, which may be preferable.
Someone sent me a npy with '|u1', which results in an error. I had to add it in dtypes so that it would parse.
hexdump -C
00000000 93 4e 55 4d 50 59 01 00 76 00 7b 27 64 65 73 63 |.NUMPY..v.{'desc|
00000010 72 27 3a 20 27 7c 75 31 27 2c 20 27 66 6f 72 74 |r': '|u1', 'fort|
00000020 72 61 6e 5f 6f 72 64 65 72 27 3a 20 46 61 6c 73 |ran_order': Fals|
00000030 65 2c 20 27 73 68 61 70 65 27 3a 20 28 34 39 39 |e, 'shape': (499|
00000040 35 30 30 2c 29 2c 20 7d 20 20 20 20 20 20 20 20 |500,), } |
Hi,
I am using your library for fast prototyping.
It was very convenient to use but i had to patch it to read float16.
float16
is not supported by js, but is supported by webgl. So this allow to fetch the data as an opaque manner, the shape remains the same, and the result is anyway typed with dtype==='float16'
.
I also use some wrapper to read float16 values in js, but i think it is not really needed at this stage.
Anyway, here is setup.
import _npyjs from 'npyjs';
const npyjs = new _npyjs();
// Supports float16 as uint16
npyjs.dtypes['<f2'] = {
name: 'float16',
size: 16,
arrayConstructor: Uint16Array,
};
Tell me if you prefer to have a PR.
Regards,
/*
* Copyright 2023 aplbrain/npyjs
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* This file includes code adapted from the npyjs project,
* which is licensed under the Apache License, Version 2.0.
* The original code can be found at: https://github.com/aplbrain/npyjs/blob/master/index.js
*
* Modifications:
* Added U1 support and fixed U1 -> u1 bug, added fortran comment and made a functional interface
*/
const dTypeMapping: Record<
string,
// "<u1" | "|u1" | "<u2" | "|i1" | "<i2" | "<u4" | "<i4" | "<u8" | "<i8" | "<f4" | "<f8" | "<U1",
{
name: string;
size: number;
arrayConstructor:
| Uint8ArrayConstructor
| Uint16ArrayConstructor
| Int8ArrayConstructor
| Int16ArrayConstructor
| Int32ArrayConstructor
| BigUint64ArrayConstructor
| BigInt64ArrayConstructor
| Float32ArrayConstructor
| Float64ArrayConstructor
| Uint32ArrayConstructor;
}
> = {
"<u1": {
name: "uint8",
size: 8,
arrayConstructor: Uint8Array,
},
"|u1": {
name: "uint8",
size: 8,
arrayConstructor: Uint8Array,
},
"<u2": {
name: "uint16",
size: 16,
arrayConstructor: Uint16Array,
},
"|i1": {
name: "int8",
size: 8,
arrayConstructor: Int8Array,
},
"<i2": {
name: "int16",
size: 16,
arrayConstructor: Int16Array,
},
"<u4": {
name: "uint32",
size: 32,
arrayConstructor: Int32Array,
},
"<i4": {
name: "int32",
size: 32,
arrayConstructor: Int32Array,
},
"<u8": {
name: "uint64",
size: 64,
arrayConstructor: BigUint64Array,
},
"<i8": {
name: "int64",
size: 64,
arrayConstructor: BigInt64Array,
},
"<f4": {
name: "float32",
size: 32,
arrayConstructor: Float32Array,
},
"<f8": {
name: "float64",
size: 64,
arrayConstructor: Float64Array,
},
"<U1": {
name: "<U1", // no way to know when to use ucs2 vs ucs4
size: 32,
arrayConstructor: Uint32Array,
},
};
export const parseNpy = (arrayBufferContents: ArrayBuffer) => {
// const version = arrayBufferContents.slice(6, 8); // Uint8-encoded
const headerLength = new DataView(arrayBufferContents.slice(8, 10)).getUint8(
0
);
const offsetBytes = 10 + headerLength;
const hcontents = new TextDecoder("utf-8").decode(
new Uint8Array(arrayBufferContents.slice(10, 10 + headerLength))
);
const header = JSON.parse(
hcontents
// .toLowerCase() // True -> true
.replace(/True/g, "true")
.replace(/False/g, "false")
.replace(/'/g, '"')
.replace("(", "[")
.replace(/,*\),*/g, "]")
);
const shape = header.shape;
const dtype = dTypeMapping[header.descr];
const nums = new dtype["arrayConstructor"](arrayBufferContents, offsetBytes);
// if fortran_order:
// array.shape = shape[::-1]
// array = array.transpose()
return {
dtype: dtype.name,
data: nums,
shape,
fortranOrder: header.fortran_order,
};
};
Thanks for this awesome package!
The current release doesn't include any of the types at found here. A quick release that includes these types would be great. Currently we just copy-pasted that file into our project.
I'm using the code from this library in a TypeScript/ Deno environment. It'd be nice if there was a possible TS implementation. If not, I can take a shot at it! I have a rough, typed implementation. Let me know what you think and how I can test/ help implement this.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.