Comments (8)
As I point out in the discussion in nodejs/node#33263 there are a couple of reasons for this that need to be looked into more on the Node.js side but... the more general issue here is that passing Buffer
instances via the transferList
the way hasha
is doing is generally not safe. The reason is because it is not known if those Buffer
instances own their underlying ArrayBuffer
and BackingStore
or not. When a Buffer
(or any TypedArray
) is transferred like this, there are a range of issues that can occur that we're just now starting to see. As for why this only started happening in v14.x, that's because there was a change to the way the backing stores are handled at the v8 level.
Essentially, what needs to happen to fix this in the hasha
code is to change https://github.com/sindresorhus/hasha/blob/master/thread.js#L15-L16 such that it either:
a) creates a new Uint8Array
that is a copy of the data, and transfers that new Uint8Array
for example.. from the example I posted in nodejs/node#33263:
const fs = require('fs')
const crypto = require('crypto')
const { parentPort } = require('worker_threads')
parentPort.on('message', (message) => {
const hasher = crypto.createHash('sha256')
fs.createReadStream('example.txt')
.pipe(hasher)
.on('finish', () => {
const { buffer } = hasher.read()
const buf = new Uint8Array(buffer); // Create a copy
parentPort.postMessage({ value: buf }, [buf.buffer])
})
})
b) Avoids transferring the buffer at all and instead returns the hex-encoded string:
const fs = require('fs')
const crypto = require('crypto')
const { parentPort } = require('worker_threads')
const { pipeline } = require('stream')
parentPort.on('message', (message) => {
const input = fs.createReadStream('example.txt')
const hasher = crypto.createHash('sha256')
pipeline(input, hasher, (err) => {
if (err) {
// handle the error appropriately
return;
}
// Pass a hex of the hash rather than the buffer
parentPort.postMessage({ value: hasher.digest().toString('hex')});
});
})
Overall, however, given what thread.js in hasha is doing, I really have to question the value of using the worker thread in this way. Because the file read and hashing operations are already async, there is little performance gain to be had here with the current approach.
@sindresorhus ... if you'd allow me a little bit of self promotion, it might be worth taking a look at piscina ... it could make working with worker threads here a bit easier overall.
from hasha.
if you'd allow me a little bit of self promotion, it might be worth taking a look at piscina ... it could make working with worker threads here a bit easier overall.
I wish that had existed when we first added worker support here. It looks great.
from hasha.
// @stroncium
from hasha.
Ok, first of all, sorry for the delays.
@timsuchanek Thanks for all the work reporting and reproducing this.
@jasnell The point there is to move hashing out of main thread, not reading. I might be mistaken about original implementation or not up to date with latest changes, but as far as I know crypto
's hashes will do the hashing on the main thread.
@sindresorhus Definitely me missing the part where hash might make it's own choices regarding creating a buffer. #34 fixes the issue using buffer clone approach.
from hasha.
but as far as I know crypto's hashes will do the hashing on the main thread.
For the moment, yes. But just a heads up, Web Crypto API support is about to land in master, which does implement crypto functions off the main thread.
from hasha.
@jasnell Nice to know. Though I'm kinda suspicious it won't work too well for file streams(each chunk = some sync between threads as I believe it's unfeasible to somehow unload IO to the same thread crypto processing will run on).
from hasha.
yes, definitely. There's a definite overhead there. I've been thinking about possible alternatives that will be more feasible once the Web Crypto pr lands. It changes many of the internals. For instance, I've been thinking about an async job that'll take an FD as input and digest the entire thing without pulling any of the chunks of data into JS-land
from hasha.
@jasnell Well, that's what we're doing here(except in js-land). Regarding non-js version, adding it to standard library feels like a bit of overkill to me. I'd expect it to be a relatively rare case for users to hash(/other crypto primitives) big enough files for js overhead to really matter and when it does matter doing it outside std lib is pretty straightforward(separate process or native library).
from hasha.
Related Issues (20)
- Any chance for unhasha? HOT 3
- Hash multiple files? HOT 5
- Add support for doing the hashing in a worker thread HOT 13
- Typescript import broken for `fromFile`? HOT 3
- Add Salt? HOT 2
- New version (5.1.0) hasha.fromFile makes Electron application crash (unallowed memory allocation for worker threads) HOT 1
- New version (5.1.0) throws webpack errors HOT 2
- [warning] Module not found: Error: Can't resolve 'worker_threads' HOT 1
- Image hash length HOT 2
- hasha.fromFile throws error with message "The V8 platform used by this instance of Node does not support creating Workers" when updated to version 5.2.0 HOT 2
- Browser support HOT 2
- Why does `hasha.fromFile` returns `Promise<string | null>` instead of `Promise<string>`?
- What would be a recommended concurrency for hashing multiple files asynchronously? HOT 4
- Fatal error with Promise.all() HOT 3
- Support for AbortSignal
- convert to esm HOT 1
- It takes a long time when hasha from a large file, provider a bloom filter to speed up the hasha? HOT 6
- add support to calculate hash of first or last nkb of the file HOT 3
- create-hash instead crypto.createHash HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hasha.