Code Monkey home page Code Monkey logo

Comments (8)

jasnell avatar jasnell commented on June 22, 2024 3

As I point out in the discussion in nodejs/node#33263 there are a couple of reasons for this that need to be looked into more on the Node.js side but... the more general issue here is that passing Buffer instances via the transferList the way hasha is doing is generally not safe. The reason is because it is not known if those Buffer instances own their underlying ArrayBuffer and BackingStore or not. When a Buffer (or any TypedArray) is transferred like this, there are a range of issues that can occur that we're just now starting to see. As for why this only started happening in v14.x, that's because there was a change to the way the backing stores are handled at the v8 level.

Essentially, what needs to happen to fix this in the hasha code is to change https://github.com/sindresorhus/hasha/blob/master/thread.js#L15-L16 such that it either:

a) creates a new Uint8Array that is a copy of the data, and transfers that new Uint8Array

for example.. from the example I posted in nodejs/node#33263:

const fs = require('fs')
const crypto = require('crypto')
const { parentPort } = require('worker_threads')

parentPort.on('message', (message) => {
  const hasher = crypto.createHash('sha256')
  fs.createReadStream('example.txt')
    .pipe(hasher)
    .on('finish', () => {
      const { buffer } = hasher.read()
      const buf = new Uint8Array(buffer);  // Create a copy
      parentPort.postMessage({ value: buf }, [buf.buffer])
    })
})

b) Avoids transferring the buffer at all and instead returns the hex-encoded string:

const fs = require('fs')
const crypto = require('crypto')
const { parentPort } = require('worker_threads')
const { pipeline } = require('stream')

parentPort.on('message', (message) => {
  const input = fs.createReadStream('example.txt')
  const hasher = crypto.createHash('sha256')
  pipeline(input, hasher, (err) => {
    if (err) {
      // handle the error appropriately
      return;
    }
    // Pass a hex of the hash rather than the buffer
    parentPort.postMessage({ value: hasher.digest().toString('hex')});
  });
})

Overall, however, given what thread.js in hasha is doing, I really have to question the value of using the worker thread in this way. Because the file read and hashing operations are already async, there is little performance gain to be had here with the current approach.

@sindresorhus ... if you'd allow me a little bit of self promotion, it might be worth taking a look at piscina ... it could make working with worker threads here a bit easier overall.

from hasha.

sindresorhus avatar sindresorhus commented on June 22, 2024 2

if you'd allow me a little bit of self promotion, it might be worth taking a look at piscina ... it could make working with worker threads here a bit easier overall.

I wish that had existed when we first added worker support here. It looks great.

from hasha.

sindresorhus avatar sindresorhus commented on June 22, 2024

// @stroncium

from hasha.

stroncium avatar stroncium commented on June 22, 2024

Ok, first of all, sorry for the delays.

@timsuchanek Thanks for all the work reporting and reproducing this.

@jasnell The point there is to move hashing out of main thread, not reading. I might be mistaken about original implementation or not up to date with latest changes, but as far as I know crypto's hashes will do the hashing on the main thread.

@sindresorhus Definitely me missing the part where hash might make it's own choices regarding creating a buffer. #34 fixes the issue using buffer clone approach.

from hasha.

jasnell avatar jasnell commented on June 22, 2024

but as far as I know crypto's hashes will do the hashing on the main thread.

For the moment, yes. But just a heads up, Web Crypto API support is about to land in master, which does implement crypto functions off the main thread.

from hasha.

stroncium avatar stroncium commented on June 22, 2024

@jasnell Nice to know. Though I'm kinda suspicious it won't work too well for file streams(each chunk = some sync between threads as I believe it's unfeasible to somehow unload IO to the same thread crypto processing will run on).

from hasha.

jasnell avatar jasnell commented on June 22, 2024

yes, definitely. There's a definite overhead there. I've been thinking about possible alternatives that will be more feasible once the Web Crypto pr lands. It changes many of the internals. For instance, I've been thinking about an async job that'll take an FD as input and digest the entire thing without pulling any of the chunks of data into JS-land

from hasha.

stroncium avatar stroncium commented on June 22, 2024

@jasnell Well, that's what we're doing here(except in js-land). Regarding non-js version, adding it to standard library feels like a bit of overkill to me. I'd expect it to be a relatively rare case for users to hash(/other crypto primitives) big enough files for js overhead to really matter and when it does matter doing it outside std lib is pretty straightforward(separate process or native library).

from hasha.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.