cbreeden / fxhash Goto Github PK
View Code? Open in Web Editor NEWA fast, non-secure, hashing algorithm derived from an internal hasher in FireFox.
A fast, non-secure, hashing algorithm derived from an internal hasher in FireFox.
I've noticed that FxHash(Map/Set) is also available in https://github.com/rust-lang/rustc-hash. How do the two crates relate? It looks like the same name/description. Which one should be preferred?
It would help distro integration.
Thanks.
hello, we find out a bug , there has som code in lib.rs
impl Default for FxHasher32 {
#[inline]
fn default() -> FxHasher32 {
FxHasher32 { hash: 0 }
}
}
When we set zero into FxHasher32, the value is also zero. However, the normal value should be not zero.
then when we change the default value
impl Default for FxHasher32 {
#[inline]
fn default() -> FxHasher32 {
FxHasher32 { hash: 41 }
}
}
it's ok.
Currently https://lib.rs/crates/fxhash and https://crates.io/crates/fxhash both show let mut hashmap = FxHashMap::new();
, which is wrong (#1)
The README here has been fixed, but it would be great if a new version could remove this mistake from other sources
OS: FreeBSD 11.1-RELEASE
rust version: 1.19.0 stable and 1.21.0-nightly
https://gist.github.com/metsuke0/ed4adcaf35c399f2856f614fb141e57b
error[E0599]: no function or associated item named new
found for type std::collections::HashMap<_, _, std::hash::BuildHasherDefault<fxhash::FxHasher>>
in the current scope
--> src/main.rs:5:19
|
5 | let mut hashmap = FxHashMap::new();
| ^^^^^^^^^^^^^^
error: aborting due to previous error
error: Could not compile testing
.
Currently the crate fails to compile with edition 2018. It would be nice to update it.
I tried using FxHashMap
to improve performance of internal hash maps, but I noticed that it takes several times longer to deserialize a large hash map that we need to persist. I couldn't find anything about it on google, so I created a minimal example and am reporting it here.
The behavior can be reproduced using a simple set of random elements. For example, on my machine this code takes 0.08s to serialize and 1.2s to deserialize a HashSet
, as reported by cargo run --release
:
use std::collections::HashSet;
use std::time::Instant;
use rand::Rng;
fn main() {
let mut rnd = rand::thread_rng();
let h: HashSet<u64> = (0..10_000_000).map(|_| rnd.gen::<u64>()).collect();
let t0 = Instant::now();
let mut out = vec![];
bincode::serialize_into(&mut out, &h).unwrap();
let t1 = Instant::now();
println!("serialize: {}", (t1 - t0).as_secs_f64());
let h2: HashSet<u64> = bincode::deserialize_from(&*out).unwrap();
let t2 = Instant::now();
println!("deserialize: {}", (t2 - t1).as_secs_f64());
println!("{}", h2.len());
}
Trivially changing HashSet
to fxhash::FxHashSet
increases deserialization time to 5.95s (almost 5x slower), while serialization is unchanged. In our actual use case the original deserialization takes on the order of 2 minutes, so the slowdown visibly affects our total processing times. Code:
use fxhash::FxHashSet;
use std::time::Instant;
use rand::Rng;
fn main() {
let mut rnd = rand::thread_rng();
let h: FxHashSet<u64> = (0..10_000_000).map(|_| rnd.gen::<u64>()).collect();
let t0 = Instant::now();
let mut out = vec![];
bincode::serialize_into(&mut out, &h).unwrap();
let t1 = Instant::now();
println!("serialize: {}", (t1 - t0).as_secs_f64());
let h2: FxHashSet<u64> = bincode::deserialize_from(&*out).unwrap();
let t2 = Instant::now();
println!("deserialize: {}", (t2 - t1).as_secs_f64());
println!("{}", h2.len());
}
I thought that perhaps the issue is in pathological behavior when rebuilding a hashmap from elements extracted in the same order, so I modified the code to serialize a Vec
and deserialize by building an FxHashSet
out of the Vec
. That results in slightly slower serialization of 0.12s, but deserializes in just 0.15s, which includes both the time to deserialize the vector and the time to collect it into a new FxHashSet
. (Applying this to ordinary HashSet
didn't speed it up, it takes 0.11s to serialize and 1.3s to deserialize.) Code:
use fxhash::FxHashSet;
use std::time::Instant;
use rand::Rng;
fn main() {
let mut rnd = rand::thread_rng();
let h: FxHashSet<u64> = (0..10_000_000).map(|_| rnd.gen::<u64>()).collect();
let t0 = Instant::now();
let mut out = vec![];
let hack = h.iter().copied().collect::<Vec<u64>>();
bincode::serialize_into(&mut out, &hack).unwrap();
let t1 = Instant::now();
println!("serialize: {}", (t1 - t0).as_secs_f64());
let hack2: Vec<u64> = bincode::deserialize_from(&*out).unwrap();
let h2: FxHashSet<u64> = hack2.into_iter().collect();
let t2 = Instant::now();
println!("deserialize: {}", (t2 - t1).as_secs_f64());
println!("{}", h2.len());
}
Is this expected behavior for an FxHashMap
? Is there a way to fix it without going through a custom (and space-inefficieent) serialization/deserialization?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.