Comments (11)
Humm. That is the PR activating AES on arm on stable.
My first thought was the instruction was slow. But according to arm the instruction should have a latency of 2 cycles and a throughput of 1 per cycle.
I will have to look at the generated assembly. It may be the case that an #[inline] is missing somewhere critical
from ahash.
Looking into this, I found the code was not using the proper fast path, I will correct this in the next major release, it will take some time. However this is not the cause of the regression, as the old version was also using the slower path.
@rodya-mirov Can you run a cargo asm
with each of the versions and grep the result for "aes". I suspect that this: https://github.com/tkaitchuck/aHash/pull/183/files change may have affected the build.
from ahash.
Hi thanks for looking. I felt quite bad about dumping such a non-minimal example. Here is a better one (I didn't realize it would be this easy).
Source code:
use std::time::Instant;
use ahash::HashSet;
fn main() {
let start = Instant::now();
const FLAG: usize = !5324;
let mut s: HashSet<usize> = HashSet::default();
for i in 0 .. 500_000_000 {
s.insert(i & FLAG);
}
println!("Got {} rows (avoid optimizing away the whole exercise)", s.len());
println!("Took {:?}", start.elapsed());
}
On 0.8.6, took 1.86s
On 0.8.7, took 3.06s
Timings are relatively stable.
I'm not sure I'm using cargo asm
correctly; I ran cargo asm
for both version (no arguments) and got 326 lines of ... I have no idea what. I put in cargo asm ahash_bench::main
(the name of my test crate is ahash_bench) and got 360 lines of what looks more like assembly language, but it was the same for both versions. I don't think it's what you meant. In neither case did I get any mention of aes
.
Happy to try with different arguments but it's an unfamiliar tool and I'm not sure what I'm looking for.
from ahash.
If cargo asm ahash_bench::main
gives output was the same for both versions, then there is no difference in the code between versions.
That leaves two possibilities: there is a difference in flags between cargo asm
and cargo run
. Assuming this is not the case, it could be code alignment. I have found there are sometimes large perf differences in tiny benchmarks depending on totally unrelated change in the same binary, as it shifts code around.
I'll try this code on a few computers to see if I can reproduce it.
from ahash.
Code alignment seems possible but dubious; these are fairly large benchmarks, and I was able to reproduce the behavior in the original project and the toy project with approximately the same proportion difference in runtime, suggesting there is really something there. I"m definitely not an expert, but I've also been told alignment "can account for up to 10% performance differences" (for instance). 40% seems like an awful lot.
My money is on me not using cargo asm correctly, unfortunately (maybe I need to pass some flags, or something).
from ahash.
@rodya-mirov Can you check if this is fixed on the 0.9 prerelease branch
from ahash.
@tkaitchuck The problem seems to be much, much worse with that branch.
Some measurements I just took on my computer (which is now running MacOS Sonoma):
0.8.6 - 1.87s (3 trials)
0.8.7 - 2.97s (3 trials)
0.9.0 prerelease - 25.84s (3 trials)
That is to say, more than a 10x slowdown from the 0.8.6 branch.
This is so catastrophically bad I spent some time trying to figure out if I had some something wrong with my ability to use git dependencies, but, I don't think there's a lot of room for error here.
My cargo.toml:
[package]
name = "ahash_bench"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
# ahash = "=0.8.6"
# ahash = "=0.8.7"
ahash = { git = "https://github.com/tkaitchuck/aHash.git", branch = "0.9-prerelease" }
from ahash.
I dug into the commit history a bit:
At revision 8c3f257f3c62debd37ec0fc99069c2872d24e710
the performance is fine, equal to 0.8.6, at 1.87s or so.
Then at revision b424dc4e6c37585c9109442e75796ceb4e0ab645
(this PR: https://github.com/tkaitchuck/aHash/pull/217/files) the performance craters, all the way to the 20+ second range (the performance is fairly volatile, but consistently bad).
Note I'm using rust 1.76.
from ahash.
I appreciate you prioritizing this.
from ahash.
There was a pretty serious regression that was missed because the GitHub actions script was actually running on Intel Macs instead on ARM. Fortunately it was never merged to master so there should be no harm.
With that fixed, running the above code I do see a perf regression, but it is 15% not 10x.
from ahash.
Humm, there is some other problem I am seeing now. It is somehow even slower...
from ahash.
Related Issues (20)
- test failure on s390x. HOT 1
- compile error: use of unstable library feature 'stdsimd' HOT 20
- feature request: ahash without length prefixing HOT 3
- Deterministic hash value HOT 2
- error[E0635]: unknown feature `stdsimd` HOT 19
- Significant bump in MSRV from 0.8.7 to 0.8.8 HOT 9
- No link to crates.io HOT 1
- RandomState has too many collisions in low order bits when hashing a u64 HOT 29
- Hashing `&T` yields different results compared to `T`
- Fragile build script: crate automatically enables "specialize" feature HOT 14
- ahash 0.8.11 breaks hashbrown? HOT 6
- Work around `swap_bytes` on WebAssembly HOT 3
- git source unaligned with crates.io release HOT 5
- Linking Errors with Specific Optimization Levels When Running Test Cases HOT 1
- rust v1.78 std simd feature removed HOT 1
- `set_random_source` never returns `Err(false)`
- Replace atomic-polyfill with portable-atomic
- Mismatch between published version on crates.io and tagged version in git repo for v0.8.11 HOT 8
- Suggestion: Alternative wrapper HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ahash.