Comments (6)
Which method are you using?
from hasha.
I write this in my project, maybe you can provide a more options with hasha.fromFile?
If it provides, I am glad to use your hasha.
Here is my function
import { close, open, read, readFile, stat } from 'fs-extra';
import { createHash } from 'crypto';
const defaultHashOptions = {
fileSize: 10 * 1024,
bufSize: 2 * 1024,
sampleSize: 1 * 1024,
sampleCount: 5,
};
interface HashOptions {
fileSize: number;
bufSize: number;
sampleSize: number;
sampleCount: number;
}
const getHashFromFilePath = async (
filePath: string,
options: HashOptions = defaultHashOptions,
) => {
const hash = createHash('sha256');
let stats = null;
try {
stats = await stat(filePath);
} catch (e) {
// logError(`[md5-util] error when stating ${filePath}`, e);
}
const size = stats?.size || 0;
if (size < options.fileSize) {
const buf = await readFile(filePath);
hash.update(buf);
return hash.digest('hex');
}
const fd = await open(filePath, 'r');
const buf = Buffer.alloc(options.bufSize);
const offset = 0;
for (let i = 0; i < options.sampleCount; i++) {
const pos = parseInt(((size * i) / options.sampleCount).toString());
let length = options.sampleSize;
if (pos + length > size) {
length = parseInt((size - pos).toString());
}
const bytes = await read(fd, buf, offset, length, pos);
if (bytes.bytesRead > 0) {
hash.update(bytes.buffer);
}
}
await close(fd);
return hash.digest('hex');
};
export { getHashFromFilePath };
from hasha.
I have to hash a mount of large files in a few seconds, do you have any good ideas?
from hasha.
And bloom filter resolves it.
Nice, how did you do it?
from hasha.
@papb This is the code, but it has false positive rate. Do you have any idea to improve it?
for (let i = 0; i < options.sampleCount; i++) {
const pos = parseInt(((size * i) / options.sampleCount).toString());
let length = options.sampleSize;
if (pos + length > size) {
length = parseInt((size - pos).toString());
}
const bytes = await read(fd, buf, offset, length, pos);
if (bytes.bytesRead > 0) {
hash.update(bytes.buffer);
}
}
from hasha.
You're using .fromFile
which comes with some initial overhead as it has to spawn a new worker_thread
. This is only done once though, so multiple calls should be cheaper. Depending on your use-case, .fromFileSync
is probably a lot faster (but it's blocking, so not good for servers).
from hasha.
Related Issues (20)
- Any chance for unhasha? HOT 3
- Hash multiple files? HOT 5
- Add support for doing the hashing in a worker thread HOT 13
- Typescript import broken for `fromFile`? HOT 3
- Add Salt? HOT 2
- New version (5.1.0) hasha.fromFile makes Electron application crash (unallowed memory allocation for worker threads) HOT 1
- New version (5.1.0) throws webpack errors HOT 2
- [warning] Module not found: Error: Can't resolve 'worker_threads' HOT 1
- Image hash length HOT 2
- `fromFile` broken on Node 14 HOT 8
- hasha.fromFile throws error with message "The V8 platform used by this instance of Node does not support creating Workers" when updated to version 5.2.0 HOT 2
- Browser support HOT 2
- Why does `hasha.fromFile` returns `Promise<string | null>` instead of `Promise<string>`?
- What would be a recommended concurrency for hashing multiple files asynchronously? HOT 4
- Fatal error with Promise.all() HOT 3
- Support for AbortSignal
- convert to esm HOT 1
- add support to calculate hash of first or last nkb of the file HOT 3
- create-hash instead crypto.createHash HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hasha.