Code Monkey home page Code Monkey logo

Comments (12)

tkaitchuck avatar tkaitchuck commented on May 27, 2024

Yes, in intel systems there was a major optimization introduced in skylake which reduced the latency of the the AES instruction from 7 to 4 cycles. See: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=aes&expand=227

It is worth noting that it affected the latency but not the throughput, so if instructions get properly aligned and pipelined (assuming there is other work to do) the delay should not be an issue. If its showing up in macrobenchmarks that obviously isn't happening.

From the raw results above the strings less than 8 characters are slower than those which are longer. I've seen that before and it came down to code alignment. It is possible to force alignments, and I did some experimenting with that but couldn't find anything that consistently gave better results than letting the compiler figure it out.

But that's not likely the same issue you are seeing in the macro benchmarks. (Or at least not entirely). It is possible to switch the algorithm based on the type in nightly, using specialization. However it is not possible to dispatch based on the size. So I cannot change which algorithm is used, but can only change the update function within the algorithm. There is already a code path for < 8 byte strings, in the aes variant. It should be possible to replace this with an alternate update function, but it's not as straightforward as copying the code from the fallback because it is designed to work with a 64bit state not a 128bit one, and order of the update is different.

I have to think more about how to deal with this. If you have any ideas, let me know.

from ahash.

tkaitchuck avatar tkaitchuck commented on May 27, 2024

I thought of a way to reduce it to 3 aes rounds instead of 4. If we want to go beyond that we might need more information.
@as-com You mentioned "cases where the values are known to be short", are they also known to be of fixed length? Because I could make it a LOT faster if I knew the exact length.
Also, I never asked, Are you OK with a nightly only solution?

from ahash.

as-com avatar as-com commented on May 27, 2024

The macrobenchmark involves handling JSON documents with keys that are of variable lengths using IndexMap, but based on the performance numbers (10-20% performance decrease with AES enabled), and based on examination of the documents, I would presume that the key lengths skew a lot shorter.

A nightly only solution would be fine for my use-case, but probably not for most people using this library.

from ahash.

tkaitchuck avatar tkaitchuck commented on May 27, 2024

@as-com Can you run a test with the short-string branch and let me know how that performs?

from ahash.

as-com avatar as-com commented on May 27, 2024

Testing on the same machine with Rust Nightly (1.51, 2021-01-14), the performance appears to be significantly improved, but still slightly slower than fallback:

aeshash/u8              time:   [838.63 ps 843.78 ps 849.01 ps]

aeshash/u16             time:   [859.63 ps 863.63 ps 867.29 ps]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

aeshash/u32             time:   [859.04 ps 862.68 ps 866.58 ps]
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

aeshash/u64             time:   [879.67 ps 885.40 ps 892.14 ps]
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) high mild
  6 (6.00%) high severe

aeshash/u128            time:   [673.11 ps 675.10 ps 677.13 ps]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

aeshash/string/"1"      time:   [1.8148 ns 1.8236 ns 1.8322 ns]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
aeshash/string/"123"    time:   [1.8979 ns 1.9030 ns 1.9082 ns]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe
aeshash/string/"1234"   time:   [1.8842 ns 1.8959 ns 1.9090 ns]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
aeshash/string/"1234567"
                        time:   [1.8842 ns 1.8890 ns 1.8940 ns]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe
aeshash/string/"12345678"
                        time:   [1.8769 ns 1.8836 ns 1.8901 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
aeshash/string/"123456789012345"
                        time:   [2.2297 ns 2.2353 ns 2.2407 ns]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe
aeshash/string/"1234567890123456"
                        time:   [2.2013 ns 2.2107 ns 2.2207 ns]
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe
aeshash/string/"123456789012345678901234"
                        time:   [2.4521 ns 2.4714 ns 2.4917 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
aeshash/string/"123456789012345678901234567890123"
                        time:   [3.8210 ns 3.8404 ns 3.8598 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
aeshash/string/"12345678901234567890123456789012345678901234567890123456789012345678"
                        time:   [8.8334 ns 8.9073 ns 9.0093 ns]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
aeshash/string/"123456789012345678901234567890123456789012345678901234567890123456789012345678901234...
                        time:   [12.059 ns 12.151 ns 12.244 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
aeshash/string/"123456789012345678901234567890123456789012345678901234567890123456789012345678901234... #2
                        time:   [47.253 ns 47.345 ns 47.438 ns]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe
fallback/u8             time:   [877.39 ps 879.56 ps 881.71 ps]
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) low mild
  3 (3.00%) high mild

fallback/u16            time:   [859.76 ps 863.24 ps 866.81 ps]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high severe

fallback/u32            time:   [852.12 ps 854.74 ps 857.55 ps]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

fallback/u64            time:   [848.59 ps 851.24 ps 853.99 ps]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

fallback/u128           time:   [652.45 ps 654.71 ps 657.22 ps]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

fallback/string/"1"     time:   [1.5380 ns 1.5453 ns 1.5529 ns]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
fallback/string/"123"   time:   [1.6662 ns 1.6714 ns 1.6774 ns]
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe
fallback/string/"1234"  time:   [1.6522 ns 1.6641 ns 1.6777 ns]
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe
fallback/string/"1234567"
                        time:   [1.6455 ns 1.6512 ns 1.6574 ns]
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
fallback/string/"12345678"
                        time:   [1.6974 ns 1.7238 ns 1.7596 ns]
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe
fallback/string/"123456789012345"
                        time:   [1.9137 ns 1.9221 ns 1.9307 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
fallback/string/"1234567890123456"
                        time:   [1.9110 ns 1.9195 ns 1.9281 ns]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
fallback/string/"123456789012345678901234"
                        time:   [3.0018 ns 3.0198 ns 3.0391 ns]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
fallback/string/"123456789012345678901234567890123"
                        time:   [4.1376 ns 4.1557 ns 4.1749 ns]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
fallback/string/"12345678901234567890123456789012345678901234567890123456789012345678"
                        time:   [6.4612 ns 6.4912 ns 6.5258 ns]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe
fallback/string/"12345678901234567890123456789012345678901234567890123456789012345678901234567890123...
                        time:   [10.594 ns 10.627 ns 10.665 ns]
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild
  2 (2.00%) high severe
fallback/string/"12345678901234567890123456789012345678901234567890123456789012345678901234567890123... #2
                        time:   [71.164 ns 71.403 ns 71.668 ns]
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe

In the JSON-handling macrobenchmark, performance appears to be mostly unchanged compared to the master branch with AES enabled (i.e. still slower by 10% or so).

I suspect there is some weirdness going on with AES instructions stalling something, or something else.

from ahash.

tkaitchuck avatar tkaitchuck commented on May 27, 2024

Well, there is also the size of the state. The AES version needs to create keys which consist of 3 128 bit values. The fallback only needs 4 64 bit values. So the difference in instantiation of the hasher may also be different.

from ahash.

tkaitchuck avatar tkaitchuck commented on May 27, 2024

I had to restructure the way specialization worked, and the approach I had earlier won't work.
I have pushed an update to the branch, which on my computer brings them to parity. I'm guessing there still might be gap on Broadwell. I am not sure if there is a way to improve it further.

@as-com Let me know how this preforms for you.

from ahash.

as-com avatar as-com commented on May 27, 2024

On Rust Nightly (1.51 2021-01-22), performance is either on-par or improved in the benchmark:

aeshash/u8              time:   [879.92 ps 889.53 ps 902.41 ps]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

aeshash/u16             time:   [841.33 ps 855.18 ps 871.20 ps]
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

aeshash/u32             time:   [837.23 ps 845.40 ps 854.08 ps]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

aeshash/u64             time:   [839.42 ps 846.51 ps 853.51 ps]

aeshash/u128            time:   [640.35 ps 646.60 ps 653.21 ps]
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

aeshash/string/"1"      time:   [2.1187 ns 2.1650 ns 2.2251 ns]
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe
aeshash/string/"123"    time:   [2.0345 ns 2.0509 ns 2.0680 ns]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
aeshash/string/"1234"   time:   [2.0101 ns 2.0278 ns 2.0466 ns]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
aeshash/string/"1234567"
                        time:   [1.9864 ns 1.9976 ns 2.0096 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
aeshash/string/"12345678"
                        time:   [1.9647 ns 1.9807 ns 1.9974 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
aeshash/string/"123456789012345"
                        time:   [1.9440 ns 1.9554 ns 1.9676 ns]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
aeshash/string/"1234567890123456"
                        time:   [1.9643 ns 1.9782 ns 1.9924 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
aeshash/string/"123456789012345678901234"
                        time:   [2.2632 ns 2.2803 ns 2.2979 ns]
aeshash/string/"123456789012345678901234567890123"
                        time:   [3.6303 ns 3.6548 ns 3.6815 ns]
aeshash/string/"12345678901234567890123456789012345678901234567890123456789012345678"
                        time:   [8.8100 ns 8.8864 ns 8.9678 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
aeshash/string/"123456789012345678901234567890123456789012345678901234567890123456789012345678901234...
                        time:   [11.733 ns 11.808 ns 11.889 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
aeshash/string/"123456789012345678901234567890123456789012345678901234567890123456789012345678901234... #2
                        time:   [46.283 ns 46.689 ns 47.089 ns]
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  9 (9.00%) high mild

     Running target/release/deps/map-603ff0ff8955ff4d
aes_words               time:   [5.8660 ms 6.0004 ms 6.1419 ms]
Found 13 outliers among 100 measurements (13.00%)
  13 (13.00%) high mild
fallback/u8             time:   [830.33 ps 836.74 ps 843.80 ps]

fallback/u16            time:   [824.05 ps 829.57 ps 835.45 ps]

fallback/u32            time:   [830.78 ps 836.95 ps 844.24 ps]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

fallback/u64            time:   [846.38 ps 852.69 ps 859.27 ps]
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

fallback/u128           time:   [639.81 ps 644.63 ps 649.74 ps]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

fallback/string/"1"     time:   [2.0946 ns 2.1116 ns 2.1292 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
fallback/string/"123"   time:   [1.9612 ns 1.9743 ns 1.9881 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
fallback/string/"1234"  time:   [2.0912 ns 2.1086 ns 2.1261 ns]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
fallback/string/"1234567"
                        time:   [2.0701 ns 2.0833 ns 2.0978 ns]
fallback/string/"12345678"
                        time:   [2.0771 ns 2.0920 ns 2.1085 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
fallback/string/"123456789012345"
                        time:   [2.0916 ns 2.1102 ns 2.1296 ns]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
fallback/string/"1234567890123456"
                        time:   [2.0967 ns 2.1132 ns 2.1310 ns]
fallback/string/"123456789012345678901234"
                        time:   [3.9467 ns 3.9883 ns 4.0304 ns]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
fallback/string/"123456789012345678901234567890123"
                        time:   [4.3666 ns 4.3950 ns 4.4253 ns]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
fallback/string/"12345678901234567890123456789012345678901234567890123456789012345678"
                        time:   [6.6940 ns 6.7876 ns 6.8927 ns]
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
fallback/string/"12345678901234567890123456789012345678901234567890123456789012345678901234567890123...
                        time:   [10.969 ns 11.039 ns 11.115 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
fallback/string/"12345678901234567890123456789012345678901234567890123456789012345678901234567890123... #2
                        time:   [73.150 ns 73.804 ns 74.545 ns]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

In the JSON macrobenchmark, performance appears to be unchanged compared to the master branch. Disabling AES support appears to cause about a 6% increase in performance. Seems the larger state of the AES hash is the real problem here.

from ahash.

tkaitchuck avatar tkaitchuck commented on May 27, 2024

@as-com can you try the json benchmark with HashBrown 0.10.0 on nightly? (I know it's yanked and am attempting to sort that out)

from ahash.

as-com avatar as-com commented on May 27, 2024

Running the JSON benchmark with hashbrown's master branch (feature nightly enabled), aHash 0.7, and Rust Nightly (1.5.1 2021-01-30), the performance with AES is 135.15 ops/sec, and performance without AES 147.04 ops/sec.

Compared to hashbrown 0.9.1 with aHash 0.7, performance is unchanged.

from ahash.

as-com avatar as-com commented on May 27, 2024

Note to self: run tests again in light of rust-lang/rust#83027 and rust-lang/rust#83084

from ahash.

as-com avatar as-com commented on May 27, 2024

Update: the performance regression from using -C target-cpu=native to enable AES support on Broadwell appears to have disappeared on the latest Rust Nightly, and performance compared to disabling AES support is improved by a few percentage points. I'll consider this issue resolved.

from ahash.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.