Comments (12)
Yes, in intel systems there was a major optimization introduced in skylake which reduced the latency of the the AES instruction from 7 to 4 cycles. See: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=aes&expand=227
It is worth noting that it affected the latency but not the throughput, so if instructions get properly aligned and pipelined (assuming there is other work to do) the delay should not be an issue. If its showing up in macrobenchmarks that obviously isn't happening.
From the raw results above the strings less than 8 characters are slower than those which are longer. I've seen that before and it came down to code alignment. It is possible to force alignments, and I did some experimenting with that but couldn't find anything that consistently gave better results than letting the compiler figure it out.
But that's not likely the same issue you are seeing in the macro benchmarks. (Or at least not entirely). It is possible to switch the algorithm based on the type in nightly, using specialization. However it is not possible to dispatch based on the size. So I cannot change which algorithm is used, but can only change the update function within the algorithm. There is already a code path for < 8 byte strings, in the aes variant. It should be possible to replace this with an alternate update function, but it's not as straightforward as copying the code from the fallback because it is designed to work with a 64bit state not a 128bit one, and order of the update is different.
I have to think more about how to deal with this. If you have any ideas, let me know.
from ahash.
I thought of a way to reduce it to 3 aes rounds instead of 4. If we want to go beyond that we might need more information.
@as-com You mentioned "cases where the values are known to be short", are they also known to be of fixed length? Because I could make it a LOT faster if I knew the exact length.
Also, I never asked, Are you OK with a nightly only solution?
from ahash.
The macrobenchmark involves handling JSON documents with keys that are of variable lengths using IndexMap, but based on the performance numbers (10-20% performance decrease with AES enabled), and based on examination of the documents, I would presume that the key lengths skew a lot shorter.
A nightly only solution would be fine for my use-case, but probably not for most people using this library.
from ahash.
@as-com Can you run a test with the short-string branch and let me know how that performs?
from ahash.
Testing on the same machine with Rust Nightly (1.51, 2021-01-14), the performance appears to be significantly improved, but still slightly slower than fallback:
aeshash/u8 time: [838.63 ps 843.78 ps 849.01 ps]
aeshash/u16 time: [859.63 ps 863.63 ps 867.29 ps]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
aeshash/u32 time: [859.04 ps 862.68 ps 866.58 ps]
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe
aeshash/u64 time: [879.67 ps 885.40 ps 892.14 ps]
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) high mild
6 (6.00%) high severe
aeshash/u128 time: [673.11 ps 675.10 ps 677.13 ps]
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severe
aeshash/string/"1" time: [1.8148 ns 1.8236 ns 1.8322 ns]
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
aeshash/string/"123" time: [1.8979 ns 1.9030 ns 1.9082 ns]
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
1 (1.00%) high mild
2 (2.00%) high severe
aeshash/string/"1234" time: [1.8842 ns 1.8959 ns 1.9090 ns]
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe
aeshash/string/"1234567"
time: [1.8842 ns 1.8890 ns 1.8940 ns]
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severe
aeshash/string/"12345678"
time: [1.8769 ns 1.8836 ns 1.8901 ns]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
aeshash/string/"123456789012345"
time: [2.2297 ns 2.2353 ns 2.2407 ns]
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high severe
aeshash/string/"1234567890123456"
time: [2.2013 ns 2.2107 ns 2.2207 ns]
Found 6 outliers among 100 measurements (6.00%)
2 (2.00%) low mild
3 (3.00%) high mild
1 (1.00%) high severe
aeshash/string/"123456789012345678901234"
time: [2.4521 ns 2.4714 ns 2.4917 ns]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
aeshash/string/"123456789012345678901234567890123"
time: [3.8210 ns 3.8404 ns 3.8598 ns]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
aeshash/string/"12345678901234567890123456789012345678901234567890123456789012345678"
time: [8.8334 ns 8.9073 ns 9.0093 ns]
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe
aeshash/string/"123456789012345678901234567890123456789012345678901234567890123456789012345678901234...
time: [12.059 ns 12.151 ns 12.244 ns]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
aeshash/string/"123456789012345678901234567890123456789012345678901234567890123456789012345678901234... #2
time: [47.253 ns 47.345 ns 47.438 ns]
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) low mild
1 (1.00%) high mild
1 (1.00%) high severe
fallback/u8 time: [877.39 ps 879.56 ps 881.71 ps]
Found 9 outliers among 100 measurements (9.00%)
6 (6.00%) low mild
3 (3.00%) high mild
fallback/u16 time: [859.76 ps 863.24 ps 866.81 ps]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high severe
fallback/u32 time: [852.12 ps 854.74 ps 857.55 ps]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
fallback/u64 time: [848.59 ps 851.24 ps 853.99 ps]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
fallback/u128 time: [652.45 ps 654.71 ps 657.22 ps]
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
fallback/string/"1" time: [1.5380 ns 1.5453 ns 1.5529 ns]
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
fallback/string/"123" time: [1.6662 ns 1.6714 ns 1.6774 ns]
Found 6 outliers among 100 measurements (6.00%)
5 (5.00%) high mild
1 (1.00%) high severe
fallback/string/"1234" time: [1.6522 ns 1.6641 ns 1.6777 ns]
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe
fallback/string/"1234567"
time: [1.6455 ns 1.6512 ns 1.6574 ns]
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
fallback/string/"12345678"
time: [1.6974 ns 1.7238 ns 1.7596 ns]
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) high mild
5 (5.00%) high severe
fallback/string/"123456789012345"
time: [1.9137 ns 1.9221 ns 1.9307 ns]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
fallback/string/"1234567890123456"
time: [1.9110 ns 1.9195 ns 1.9281 ns]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
fallback/string/"123456789012345678901234"
time: [3.0018 ns 3.0198 ns 3.0391 ns]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
fallback/string/"123456789012345678901234567890123"
time: [4.1376 ns 4.1557 ns 4.1749 ns]
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
fallback/string/"12345678901234567890123456789012345678901234567890123456789012345678"
time: [6.4612 ns 6.4912 ns 6.5258 ns]
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high severe
fallback/string/"12345678901234567890123456789012345678901234567890123456789012345678901234567890123...
time: [10.594 ns 10.627 ns 10.665 ns]
Found 9 outliers among 100 measurements (9.00%)
1 (1.00%) low mild
6 (6.00%) high mild
2 (2.00%) high severe
fallback/string/"12345678901234567890123456789012345678901234567890123456789012345678901234567890123... #2
time: [71.164 ns 71.403 ns 71.668 ns]
Found 6 outliers among 100 measurements (6.00%)
2 (2.00%) high mild
4 (4.00%) high severe
In the JSON-handling macrobenchmark, performance appears to be mostly unchanged compared to the master branch with AES enabled (i.e. still slower by 10% or so).
I suspect there is some weirdness going on with AES instructions stalling something, or something else.
from ahash.
Well, there is also the size of the state. The AES version needs to create keys which consist of 3 128 bit values. The fallback only needs 4 64 bit values. So the difference in instantiation of the hasher may also be different.
from ahash.
I had to restructure the way specialization worked, and the approach I had earlier won't work.
I have pushed an update to the branch, which on my computer brings them to parity. I'm guessing there still might be gap on Broadwell. I am not sure if there is a way to improve it further.
@as-com Let me know how this preforms for you.
from ahash.
On Rust Nightly (1.51 2021-01-22), performance is either on-par or improved in the benchmark:
aeshash/u8 time: [879.92 ps 889.53 ps 902.41 ps]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
aeshash/u16 time: [841.33 ps 855.18 ps 871.20 ps]
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe
aeshash/u32 time: [837.23 ps 845.40 ps 854.08 ps]
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
aeshash/u64 time: [839.42 ps 846.51 ps 853.51 ps]
aeshash/u128 time: [640.35 ps 646.60 ps 653.21 ps]
Found 6 outliers among 100 measurements (6.00%)
4 (4.00%) high mild
2 (2.00%) high severe
aeshash/string/"1" time: [2.1187 ns 2.1650 ns 2.2251 ns]
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) high mild
4 (4.00%) high severe
aeshash/string/"123" time: [2.0345 ns 2.0509 ns 2.0680 ns]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
aeshash/string/"1234" time: [2.0101 ns 2.0278 ns 2.0466 ns]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
aeshash/string/"1234567"
time: [1.9864 ns 1.9976 ns 2.0096 ns]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
aeshash/string/"12345678"
time: [1.9647 ns 1.9807 ns 1.9974 ns]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
aeshash/string/"123456789012345"
time: [1.9440 ns 1.9554 ns 1.9676 ns]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
aeshash/string/"1234567890123456"
time: [1.9643 ns 1.9782 ns 1.9924 ns]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
aeshash/string/"123456789012345678901234"
time: [2.2632 ns 2.2803 ns 2.2979 ns]
aeshash/string/"123456789012345678901234567890123"
time: [3.6303 ns 3.6548 ns 3.6815 ns]
aeshash/string/"12345678901234567890123456789012345678901234567890123456789012345678"
time: [8.8100 ns 8.8864 ns 8.9678 ns]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
aeshash/string/"123456789012345678901234567890123456789012345678901234567890123456789012345678901234...
time: [11.733 ns 11.808 ns 11.889 ns]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
aeshash/string/"123456789012345678901234567890123456789012345678901234567890123456789012345678901234... #2
time: [46.283 ns 46.689 ns 47.089 ns]
Found 10 outliers among 100 measurements (10.00%)
1 (1.00%) low mild
9 (9.00%) high mild
Running target/release/deps/map-603ff0ff8955ff4d
aes_words time: [5.8660 ms 6.0004 ms 6.1419 ms]
Found 13 outliers among 100 measurements (13.00%)
13 (13.00%) high mild
fallback/u8 time: [830.33 ps 836.74 ps 843.80 ps]
fallback/u16 time: [824.05 ps 829.57 ps 835.45 ps]
fallback/u32 time: [830.78 ps 836.95 ps 844.24 ps]
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
fallback/u64 time: [846.38 ps 852.69 ps 859.27 ps]
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
fallback/u128 time: [639.81 ps 644.63 ps 649.74 ps]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
fallback/string/"1" time: [2.0946 ns 2.1116 ns 2.1292 ns]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
fallback/string/"123" time: [1.9612 ns 1.9743 ns 1.9881 ns]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
fallback/string/"1234" time: [2.0912 ns 2.1086 ns 2.1261 ns]
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
fallback/string/"1234567"
time: [2.0701 ns 2.0833 ns 2.0978 ns]
fallback/string/"12345678"
time: [2.0771 ns 2.0920 ns 2.1085 ns]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
fallback/string/"123456789012345"
time: [2.0916 ns 2.1102 ns 2.1296 ns]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
fallback/string/"1234567890123456"
time: [2.0967 ns 2.1132 ns 2.1310 ns]
fallback/string/"123456789012345678901234"
time: [3.9467 ns 3.9883 ns 4.0304 ns]
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
fallback/string/"123456789012345678901234567890123"
time: [4.3666 ns 4.3950 ns 4.4253 ns]
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
fallback/string/"12345678901234567890123456789012345678901234567890123456789012345678"
time: [6.6940 ns 6.7876 ns 6.8927 ns]
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
fallback/string/"12345678901234567890123456789012345678901234567890123456789012345678901234567890123...
time: [10.969 ns 11.039 ns 11.115 ns]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
fallback/string/"12345678901234567890123456789012345678901234567890123456789012345678901234567890123... #2
time: [73.150 ns 73.804 ns 74.545 ns]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
In the JSON macrobenchmark, performance appears to be unchanged compared to the master branch. Disabling AES support appears to cause about a 6% increase in performance. Seems the larger state of the AES hash is the real problem here.
from ahash.
@as-com can you try the json benchmark with HashBrown 0.10.0 on nightly? (I know it's yanked and am attempting to sort that out)
from ahash.
Running the JSON benchmark with hashbrown's master branch (feature nightly
enabled), aHash 0.7, and Rust Nightly (1.5.1 2021-01-30), the performance with AES is 135.15 ops/sec, and performance without AES 147.04 ops/sec.
Compared to hashbrown 0.9.1 with aHash 0.7, performance is unchanged.
from ahash.
Note to self: run tests again in light of rust-lang/rust#83027 and rust-lang/rust#83084
from ahash.
Update: the performance regression from using -C target-cpu=native
to enable AES support on Broadwell appears to have disappeared on the latest Rust Nightly, and performance compared to disabling AES support is improved by a few percentage points. I'll consider this issue resolved.
from ahash.
Related Issues (20)
- Significant performance regression from 0.8.6 to 0.8.7 HOT 11
- compile error: use of unstable library feature 'stdsimd' HOT 20
- feature request: ahash without length prefixing HOT 3
- Deterministic hash value HOT 2
- error[E0635]: unknown feature `stdsimd` HOT 19
- Significant bump in MSRV from 0.8.7 to 0.8.8 HOT 9
- No link to crates.io HOT 1
- RandomState has too many collisions in low order bits when hashing a u64 HOT 29
- Hashing `&T` yields different results compared to `T`
- Fragile build script: crate automatically enables "specialize" feature HOT 14
- ahash 0.8.11 breaks hashbrown? HOT 6
- Work around `swap_bytes` on WebAssembly HOT 1
- git source unaligned with crates.io release HOT 5
- Linking Errors with Specific Optimization Levels When Running Test Cases HOT 1
- rust v1.78 std simd feature removed HOT 1
- `set_random_source` never returns `Err(false)`
- Replace atomic-polyfill with portable-atomic
- Mismatch between published version on crates.io and tagged version in git repo for v0.8.11 HOT 8
- Suggestion: Alternative wrapper HOT 1
- AES not enabled on AArch64
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ahash.