Code Monkey home page Code Monkey logo

Comments (6)

sindresorhus avatar sindresorhus commented on July 21, 2024

Which method are you using?

from hasha.

mankeheaven avatar mankeheaven commented on July 21, 2024

I write this in my project, maybe you can provide a more options with hasha.fromFile?
If it provides, I am glad to use your hasha.
Here is my function

import { close, open, read, readFile, stat } from 'fs-extra';

import { createHash } from 'crypto';

const defaultHashOptions = {
  fileSize: 10 * 1024,
  bufSize: 2 * 1024,
  sampleSize: 1 * 1024, 
  sampleCount: 5, 
};

interface HashOptions {
  fileSize: number;
  bufSize: number;
  sampleSize: number;
  sampleCount: number;
}

const getHashFromFilePath = async (
  filePath: string,
  options: HashOptions = defaultHashOptions,
) => {
  const hash = createHash('sha256');
  let stats = null;
  try {
    stats = await stat(filePath);
  } catch (e) {
    // logError(`[md5-util] error when stating ${filePath}`, e);
  }
  const size = stats?.size || 0;

  if (size < options.fileSize) {
    const buf = await readFile(filePath);
    hash.update(buf);
    return hash.digest('hex');
  }

  const fd = await open(filePath, 'r');
  const buf = Buffer.alloc(options.bufSize);
  const offset = 0;

  for (let i = 0; i < options.sampleCount; i++) {
    const pos = parseInt(((size * i) / options.sampleCount).toString());
    let length = options.sampleSize;
    if (pos + length > size) {
      length = parseInt((size - pos).toString());
    }
    const bytes = await read(fd, buf, offset, length, pos);
    if (bytes.bytesRead > 0) {
      hash.update(bytes.buffer);
    }
  }
  await close(fd);
  return hash.digest('hex');
};

export { getHashFromFilePath };

from hasha.

mankeheaven avatar mankeheaven commented on July 21, 2024

I have to hash a mount of large files in a few seconds, do you have any good ideas?

from hasha.

papb avatar papb commented on July 21, 2024

And bloom filter resolves it.

Nice, how did you do it?

from hasha.

mankeheaven avatar mankeheaven commented on July 21, 2024

@papb This is the code, but it has false positive rate. Do you have any idea to improve it?

for (let i = 0; i < options.sampleCount; i++) {
    const pos = parseInt(((size * i) / options.sampleCount).toString());
    let length = options.sampleSize;
    if (pos + length > size) {
      length = parseInt((size - pos).toString());
    }
    const bytes = await read(fd, buf, offset, length, pos);
    if (bytes.bytesRead > 0) {
      hash.update(bytes.buffer);
    }
  }

from hasha.

sindresorhus avatar sindresorhus commented on July 21, 2024

You're using .fromFile which comes with some initial overhead as it has to spawn a new worker_thread. This is only done once though, so multiple calls should be cheaper. Depending on your use-case, .fromFileSync is probably a lot faster (but it's blocking, so not good for servers).

from hasha.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.