bheisler / iai Goto Github PK

View Code? Open in Web Editor NEW

560.0 560.0 32.0 45 KB

Experimental one-shot benchmarking/profiling harness for Rust

License: Apache License 2.0

Rust 100.00%

iai's People

Contributors

Stargazers

Watchers

iai's Issues

Does it currently support counting of branch mispredictions?

Does IAI currently support counting of branch mispredictions in the benchmarked code?

Should use proccontrol instead of setarch on FreeBSD

FreeBSD doesn't have setarch. Instead, it you must use proccontrol to disable ASLR. I'll submit a PR to fix that.

Share benchmark code between iai and criterion?

Is there a way to share benchmark code between criterion and iai? I'm wondering if it would be possible for it to just be a switch for how the benchmarks are run, or a macro, etc. It seems like criterion is useful for getting real timings, while iai is useful in CI, so I could see a case for supporting both, and running one locally and the other in CI.

I guess you could create a library with the benchmark code, and then have criterion and iai main benchmark programs that just call into that, but I was wondering if it would be feasible to avoid the repetition?

Support `#[cfg(...)]` in `iai::main!`

My crate has optional features that I'd like to be able to exclude from the iai run, but the iai::main! macro does not support attributes on each entry, meaning that the best I can do is cfg over the entire macro invocation. This becomes unwieldy fast as the number of possible permutations increases.

LoganDark/stackblur-iter@b9843dc#diff-edcd762950a4c63a41c0121bf75b104e97b31a2e17652edf0361f0079d0ce6c2R46-R86

Is iai still maintained?

Critical bug fixing pull requests like #35 are not merged since June, is this project still active?
If not, can we help somehow?

Does iai support groups like criterion::group?

It seems to me that currently we can only put benchmarks into one Rust file when using iai? It there any way to separate benchmarks into submodules like what we can do using criterion?

setarch not working in docker

In an unprivileged Docker container, setarch is not allowed to run.
From #9 it seems setarch isn't the best tool anyway.

Question: Current Working Directory for iai benchmarks?

Hi there! I have an operation I want to benchmark which accesses a directory within the environment. What is the current working directory the benchmarks are run in?

Also I seem to get errors on running my benchmark code (presumably while trying to write to stdout) – is there a way to get backtraces on panic from the benchmarked code?

initialization code

Hello,

I'm looking to run some initialization once before starting any benchmarks. How do I do this? Thanks!

I found my answer for criterion here: https://www.reddit.com/r/rust/comments/8uj5oa/criterion_benchmark_with_onetime_setup_also_how/ (btw it would be nice to have that answer in the FAQ), but I'm wondering about how to do that in iai.

Exclude setup/teardown code from measurements

Hi,

It would be great to exclude setup/teardown code from benchmark measurements.

Using callgrind rather than cachegrind looks like it allows you to zero and dump the counters at particular points in the program. This can be done on the command line with --zero-before=my_fn and --dump-after=my_fn. It looks like counting can be turned on/off programatically as well.

callgrind also supports cache simulation with the same command line options and has a similar output file format, in particular with summary and events lines.

It looked to me like this would allow you to skip setup and teardown code.

Iai hangs when the benchmark calls JoinHandle::join

This benchmark currently hangs when run with $n = 0, $threads = 2 and RESEEDING_THRESHOLD = 1024.

In addition to the two threads spawned by the benchmark code, BenchmarkSharedBufferRng::new spawns another thread (at minimum priority) that reads from OsRng in batches and writes to a bounded crossbeam channel. new_standard_rng creates ReseedingRng instances that try_recv from that channel, only calling OsRng themselves if the channel is empty. The seed-sending thread exits when the rngs are both dropped. Full source code is at https://github.com/Pr0methean/shared_buffer_rng/tree/03e0448033644bce2cd1e8d2f769fd8a8c926681. A version of this benchmark that uses fire-and-forget threads (by replacing ITERATIONS_LEFT with a function-scoped Arc) runs fine, at least when the process runs only one such benchmark.

            fn [< contended_bench_ $n _shared_buffer >]() {
                ITERATIONS_LEFT.store(2 * RESEEDING_THRESHOLD * $n.max(1), SeqCst);
                let root = BenchmarkSharedBufferRng::<$n>::new(OsRng::default());
                let rngs: Vec<_> = (0..$threads)
                    .map(|_| root.new_standard_rng(RESEEDING_THRESHOLD))
                    .collect();
                drop(root);
                let background_threads: Vec<_> = rngs.into_iter()
                    .map(|mut rng| {
                        spawn(move || {
                            while ITERATIONS_LEFT.fetch_sub(1, SeqCst) > 0 {
                                black_box(rng.next_u64());
                            }
                        })
                    })
                    .collect();
                background_threads
                    .into_iter()
                    .for_each(|handle| handle.join().unwrap());
            }

Best practices for use in CI

This crate is amazing! It does exactly what I'm looking for.

I'm wondering if you have any pointers for best practices for using this is CI. Specifically in a github workflow.

my requirements would be

run the benchmarks on pull requests
fail the workflow if significant regression from main/master

how best to achieve that?

`failed to allocate a guard page` on FreeBSD

Whenever I try to run any Rust program with Cachegrind on FreeBSD, it panics with the message thread '<unnamed>' panicked at 'failed to allocate a guard page', library/std/src/sys/unix/thread.rs:364:17. It seems that Cachegrind's tricky memory tricks interfere with Rust's tricky memory tricks. Is there a way at compile-time to disable the guard page allocation? If so, iai should use it.

Cachegrind failure in non-privileged docker container (e.g. CircleCI)

I've got an issue where my benchmarks are not failing locally (ubuntu 20.04) but are failing in CI (debian buster). I've got valgrind installed there and have confirmed it's possible to run it directly, like:

cargo bench --no-run --all-features
exc=$(ls target/release/deps/ | grep -e '^iai[^.]\+$')
valgrind \
  -d \
  -v \
  --tool=cachegrind \
  --I1=32768,8,64 \
  --D1=32768,8,64 \
  --LL=8388608,16,64 \
  --cachegrind-out-file=cachegrind.out \
  "target/release/deps/$exc" \
  --iai-run 0

However, when I run cargo bench, I get a failure like:

Running `/home/circleci/project/target/release/deps/iai-b3e03a1f9e4644b7 iai --bench`
thread 'main' panicked at 'Failed to run benchmark in cachegrind. Exit code: exit code: 1', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/iai-0.1.1/src/lib.rs:118:9

the interesting portion of the backtrace is

  15:     0x558f5004b227 - iai::run_bench::h77107b12d80265f1
  16:     0x558f5004ccd7 - iai::runner::hf910ff229467010c
  17:     0x558f500457f3 - std::sys_common::backtrace::__rust_begin_short_backtrace::hd07c56481eb04e03
  18:     0x558f500457b9 - std::rt::lang_start::{{closure}}::h11bcbb207c0366c3
  19:     0x558f50070a07 - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::h527fb2333ede305e
                               at /rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/core/src/ops/function.rs:259:13
  20:     0x558f50070a07 - std::panicking::try::do_call::h309d8aee8149866c
                               at /rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/std/src/panicking.rs:379:40
  21:     0x558f50070a07 - std::panicking::try::h75a60c31fd16bfc6
                               at /rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/std/src/panicking.rs:343:19
  22:     0x558f50070a07 - std::panic::catch_unwind::h1f9892423e99bc00
                               at /rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/std/src/panic.rs:431:14
  23:     0x558f50070a07 - std::rt::lang_start_internal::hd5b67df56ca01dae
                               at /rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/std/src/rt.rs:51:25
  24:     0x558f50045132 - main
  25:     0x7f7cda2b009b - __libc_start_main
  26:     0x558f5004502a - _start
  27:                0x0 - <unknown>

I've tried getting more out of valgrind by running with the VALGRIND_OPTS
environment variable set to "-v" and "-d -v", but it doesn't appear to be
useful, in that there's still no stdout, and the target/iai directory doesn't
exist.

I'd really appreciate any suggestions on how to debug this further!

Current status of the project / performance?

Hi thanks for the lib! I wonder what does the "experimental" mean - e.g. will it output wrong results? In addition, I wonder how slow will the benchmark be, for example, 10x slower or 100x? (my code runs for 1~10 second per run so the time is a constraint) Thanks!

iai doesn't use the correct target dir in multi-crate workspaces or respect --target-dir

iai uses a hardcoded relative path "target/iai/cachegrind.out.{}" for outputs which isn't the correct target directory in some cases. Particularly when you are using a multi-crate workspace (which uses a shared target directory for all crates) and when cargo bench's --target-dir argument is set. I was going to just PR a fix (and I'm happy to do so if we conclude a good way to do it), but I couldn't find an elegant way to fix it. The best I've thought of so far is popping three (release, deps, and the iai- binary) segments off the binary path but I'm not confident that is correct in all cases.

Set up CI

Probably take the opportunity to learn Github Actions.

Use hardware performance counters instead of cachegrind

Iai is very exciting! I love the idea of benchmarks that are fast and deterministic. But relying on Cachegrind has some drawbacks:

Limited OS support
Requires the user to install valgrind
Executing binaries is slow
Valgrind alters the program's normal execution. This reduces its accuracy, and leads to bugs like #8

Modern CPUs contain hardware performance counters that can be used for nearly zero-cost profiling. Using those instead of Iai would have several benefits:

No dependency on Valgrind
Much faster to execute
The counters can be paused and restarted mid-process. This would allow Iai to skip setup and teardown sections as requested in #7 .
Wider OS support
More accurate and detailed reports.

On FreeBSD, pmc(3) provides access to the counters, and there is already a nascent Rust crate for them: pmc-rs. On Linux, I think the perfcnt and perf crates provide the same functionality.

Allow setup code to run outside of benchmark

Sorry if this is something that already exists, but I'm unable to find it in the docs:

Let's say I have something I want to benchmark a few methods of a struct, but instantiating and setting up the struct is expensive. There should be a feature where I can instantiate and set up the struct outside of the benchmark, and then pass it into the benchmark to benchmark the relevant methods.

Unexpected Results

To start, I want to thank @bheisler for all of the effort you've put into criterion and iai!!

I've been experimenting with iai and really like the notion of "one-shot" measuring for low level benchmarks. I've played round with it, but sometimes I get unexpected results. This could definitely be an error on my part, that is usually the case, but I've been unable to track it down and thus I've created this issue.

Of note, I get very consistent results if I do multiple runs of a single configuration. But sometimes I run into problems when I change something or run a slightly different command. I then can get results that look wrong to me.

First off, I use an Arch Linux system for development:

$ uname -a
Linux 3900x 6.0.12-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 08 Dec 2022 11:03:38 +0000 x86_64 GNU/Linux

$ inxi -c
CPU: 12-Core AMD Ryzen 9 3900X (-MT MCP-) speed/min/max: 2200/2200/3800 MHz Kernel: 6.0.12-arch1-1 x86_64 Up: 9h 37m 
Mem: 6392.5/32019.1 MiB (20.0%) Storage: 465.76 GiB (136.4% used) Procs: 481 Shell: bash 5.1.16 inxi: 3.1.03

I've created exper-iai with cargo new --lib which creates a lib with an fn add and test it_works:

$ cat -n src/lib.rs
     1  pub fn add(left: usize, right: usize) -> usize {
     2      left + right
     3  }
     4
     5  #[cfg(test)]
     6  mod tests {
     7      use super::*;
     8
     9      #[test]
    10      fn it_works() {
    11          let result = add(2, 2);
    12          assert_eq!(result, 4);
    13      }
    14  }

I then added a simple fn main:

$ cat -n src/main.rs
     1  use exper_iai::add;
     2
     3  fn main() {
     4      let r = add(3, 3);
     5      assert_eq!(r, 6);
     6      println!("{r}");
     7  }

And the iai benchmark is:

$ cat -n benches/bench_iai.rs 
     1  use iai::black_box;
     2  use exper_iai::add;
     3
     4  fn bench_iai_add() {
     5      black_box(add(2, 2));
     6  }
     7
     8  iai::main!(bench_iai_add,);

I also created gen_asm.sh so I could see the generated assembler.

$ cat -n asm/add.txt
     1  .section .text.exper_iai::add,"ax",@progbits
     2          .globl  exper_iai::add
     3          .p2align        4, 0x90
     4          .type   exper_iai::add,@function
     5  exper_iai::add:
     6
     7          .cfi_startproc
     8          lea rax, [rdi + rsi]
     9          ret
    10
    11          .size   exper_iai::add, .Lfunc_end0-exper_iai::add

$ cat -n asm/bench_iai_add.txt 
     1  .section .text.bench_iai::iai_wrappers::bench_iai_add,"ax",@progbits
     2          .p2align        4, 0x90
     3          .type   bench_iai::iai_wrappers::bench_iai_add,@function
     4  bench_iai::iai_wrappers::bench_iai_add:
     5
     6          .cfi_startproc
     7          push rax
     8          .cfi_def_cfa_offset 16
     9
    10          mov edi, 2
    11          mov esi, 2
    12          call qword ptr [rip + exper_iai::add@GOTPCREL]
    13          mov qword ptr [rsp], rax
    14
    15          mov rax, qword ptr [rsp]
    16
    17          pop rax
    18          .cfi_def_cfa_offset 8
    19          ret
    20
    21          .size   bench_iai::iai_wrappers::bench_iai_add, .Lfunc_end5-bench_iai::iai_wrappers::bench_iai_add

Also in Cargo.toml I added [profile.dev] and [profile.release] as well as I added rust-toolchain.toml to keep the toolchain consistent:

$ cat -n Cargo.toml
     1  [package]
     2  name = "exper_iai"
     3  authors = [ "Wink Saville <[email protected]" ]
     4  license = "MIT OR Apache-2.0"
     5  version = "0.1.0"
     6  edition = "2021"
     7
     8  # See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
     9
    10  [dev-dependencies]
    11  criterion = "0.4.0"
    12  iai = "0.1.1"
    13
    14  [[bench]]
    15  name = "bench_iai"
    16  path = "benches/bench_iai.rs"
    17  harness = false
    18
    19
    20  [features]
    21
    22  # From: https://doc.rust-lang.org/cargo/reference/profiles.html#dev
    23  [profile.dev]
    24  opt-level = 0
    25  debug = true
    26  #split-debuginfo = '...'  # Platform-specific.
    27  debug-assertions = true
    28  overflow-checks = true
    29  lto = false
    30  panic = 'unwind'
    31  incremental = true
    32  codegen-units = 256
    33  rpath = false
    34
    35  # From: https://doc.rust-lang.org/cargo/reference/profiles.html#release
    36  [profile.release]
    37  opt-level = 3
    38  debug = false
    39  #split-debuginfo = '...'  # Platform-specific.
    40  debug-assertions = false
    41  overflow-checks = false
    42  lto = false
    43  panic = 'unwind'
    44  incremental = false
    45  codegen-units = 1
    46  rpath = false

$ cat -n rust-toolchain.toml 
     1  [toolchain]
     2  channel = "stable"
     3  #channel = "nightly"

Running main and test work as expected:

$ cargo run
   Compiling exper_iai v0.1.0 (/home/wink/prgs/rust/myrepos/exper-iai)
    Finished dev [unoptimized + debuginfo] target(s) in 0.33s
     Running `target/debug/exper_iai`
6
$ cargo test
   Compiling autocfg v1.1.0
   Compiling proc-macro2 v1.0.47
...
   Compiling tinytemplate v1.2.1
   Compiling criterion v0.4.0
   Compiling exper_iai v0.1.0 (/home/wink/prgs/rust/myrepos/exper-iai)
    Finished test [unoptimized + debuginfo] target(s) in 8.58s
     Running unittests src/lib.rs (target/debug/deps/exper_iai-854898c18c69642d)

running 1 test
test tests::it_works ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/main.rs (target/debug/deps/exper_iai-6092fd66897760dc)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

   Doc-tests exper_iai

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

And running cargo bench and yields a more or less expected result:

$ cargo clean
$ cargo bench
   Compiling autocfg v1.1.0
   Compiling proc-macro2 v1.0.47
...
   Compiling tinytemplate v1.2.1
   Compiling criterion v0.4.0
    Finished bench [optimized] target(s) in 20.33s
     Running unittests src/lib.rs (target/release/deps/exper_iai-e0c596df81667934)

running 1 test
test tests::it_works ... ignored

test result: ok. 0 passed; 0 failed; 1 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/main.rs (target/release/deps/exper_iai-bbf641b3842b4eea)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/bench_iai.rs (target/release/deps/bench_iai-e75a6910d1576500)
bench_iai_add
  Instructions:                   8
  L1 Accesses:                   12
  L2 Accesses:                    2
  RAM Accesses:                   2
  Estimated Cycles:              92

And here are two more runs of just bench_iai showing the expected consistency:

$ cargo bench --bench bench_iai
    Finished bench [optimized] target(s) in 0.02s
     Running benches/bench_iai.rs (target/release/deps/bench_iai-e75a6910d1576500)
bench_iai_add
  Instructions:                   8 (No change)
  L1 Accesses:                   12 (No change)
  L2 Accesses:                    2 (No change)
  RAM Accesses:                   2 (No change)
  Estimated Cycles:              92 (No change)

$ cargo bench --bench bench_iai
    Finished bench [optimized] target(s) in 0.02s
     Running benches/bench_iai.rs (target/release/deps/bench_iai-e75a6910d1576500)
bench_iai_add
  Instructions:                   8 (No change)
  L1 Accesses:                   12 (No change)
  L2 Accesses:                    2 (No change)
  RAM Accesses:                   2 (No change)
  Estimated Cycles:              92 (No change)

Here is a my first unexpected result, if I change the command line, adding taskset -c 0, I wouldn't expect significantly different results, but Instructions is 0, that is unexpected:

$ taskset -c 0 cargo bench --bench bench_iai
    Finished bench [optimized] target(s) in 0.02s
     Running benches/bench_iai.rs (target/release/deps/bench_iai-e75a6910d1576500)
bench_iai_add
  Instructions:                   0 (-100.0000%)
  L1 Accesses:      18446744073709551615 (+153722867280912908288%)
  L2 Accesses:                    2 (No change)
  RAM Accesses:                   3 (+50.00000%)
  Estimated Cycles:             114 (+23.91304%)

$ taskset -c 0 cargo bench --bench bench_iai
    Finished bench [optimized] target(s) in 0.02s
     Running benches/bench_iai.rs (target/release/deps/bench_iai-e75a6910d1576500)
bench_iai_add
  Instructions:                   0 (No change)
  L1 Accesses:      18446744073709551615 (No change)
  L2 Accesses:                    2 (No change)
  RAM Accesses:                   3 (No change)
  Estimated Cycles:             114 (No change)

$ taskset -c 0 cargo bench --bench bench_iai
    Finished bench [optimized] target(s) in 0.02s
     Running benches/bench_iai.rs (target/release/deps/bench_iai-e75a6910d1576500)
bench_iai_add
  Instructions:                   0 (No change)
  L1 Accesses:      18446744073709551615 (No change)
  L2 Accesses:                    2 (No change)
  RAM Accesses:                   3 (No change)
  Estimated Cycles:             114 (No change)

But a bigger problem is if rename bench_iai.rs to iai.rs and the bench within that file from
bench_iai_add to iai_add:

$ cat -n benches/iai.rs 
     1	use iai::black_box;
     2	use exper_iai::add;
     3	
     4	fn iai_add() {
     5	    black_box(add(2, 2));
     6	}
     7	
     8	iai::main!(iai_add,);

And then I make the necessary changes to get things working again, see exper-iai branch rename-bench_iai_add-to-iai_add. In my opinion only "labels" have changed and none of the actual assembler code has changed.

But now I get really unexpected results, I switch branches and then clean and rerun bench and now Instructions has a value of 22:

$ git switch main
Switched to branch 'main'
Your branch is up to date with 'origin/main'.
$ git switch rename-bench_iai_add-to-iai-add 
Switched to branch 'rename-bench_iai_add-to-iai-add'
$ cargo clean
$ cargo bench
   Compiling autocfg v1.1.0
   Compiling proc-macro2 v1.0.47
...
   Compiling tinytemplate v1.2.1
   Compiling criterion v0.4.0
    Finished bench [optimized] target(s) in 20.60s
     Running unittests src/lib.rs (target/release/deps/exper_iai-e0c596df81667934)

running 1 test
test tests::it_works ... ignored

test result: ok. 0 passed; 0 failed; 1 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/main.rs (target/release/deps/exper_iai-bbf641b3842b4eea)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/iai.rs (target/release/deps/iai-1d6df879cc9849e1)
iai_add
  Instructions:                  22
  L1 Accesses:                   34
  L2 Accesses:                    2
  RAM Accesses:                   2
  Estimated Cycles:             114

$ cargo bench --bench iai
    Finished bench [optimized] target(s) in 0.02s
     Running benches/iai.rs (target/release/deps/iai-1d6df879cc9849e1)
iai_add
  Instructions:                  22 (No change)
  L1 Accesses:                   34 (No change)
  L2 Accesses:                    2 (No change)
  RAM Accesses:                   2 (No change)
  Estimated Cycles:             114 (No change)

$ cargo bench --bench iai
    Finished bench [optimized] target(s) in 0.02s
     Running benches/iai.rs (target/release/deps/iai-1d6df879cc9849e1)
iai_add
  Instructions:                  22 (No change)
  L1 Accesses:                   34 (No change)
  L2 Accesses:                    2 (No change)
  RAM Accesses:                   2 (No change)
  Estimated Cycles:             114 (No change)

How do you prounounce `iai`?

Is it eye? eye-eye? ay? I have no idea.

I would post this in a discord or gitter, but I can't see one.

Run benchmarks in parallel

Hey, awesome project! I actually hand-rolled a similar thing, and now I'm using iai instead.

One nice thing about using Cachegrind is that saturating the CPU shouldn't affect the benchmark results! So iai should be able to run the benchmarks in parallel. This could really help mitigate the slowness that comes from using Cachegrind.

Example showing how to use iai_macro

I played a bit with iai yesterday and was really impressed by the stable results. Then I also made a quick attempt to use the iai_macro but couldn't figure out what to do. Could you add a simple example?

Add chapter to the Criterion-rs user guide about Iai

I suppose in theory I could write a separate guide, but nah. They're supposed to be complementary and used together, and that will only get more true when I get around to adding integration with cargo-criterion.

[QUESTION] Why are L2 accesses not taken into account in estimation?

I know the approximation comes from an article used to estimate times in Python code, IIRC empirical. What I don't understand is why that formula ignores L2 accesses. I would expect them to produce a bigger hit than L1, as they are slower.
I'm asking because some code of mine produces a big (20%) increase in L2 accesses without changing RAM accesses or L1 accesses significantly as they're simply changing some small (two words) values in arguments to functions and returns from references to the pair to simply value copies.
I would expect that to inform a slowdown (I expect that to be slower than the original program), but instead I see speed estimation more or less unchanged, actually with a tiny negative number.

I haven't yet compared wall times tho.

Panics with `no entry found for key` since valgrind 3.21.0

❯ RUST_BACKTRACE=1 cargo bench
    Finished bench [optimized] target(s) in 0.01s
     Running benches/benches.rs (target/release/deps/benches-0f90e9e90a9ae8e1)
thread 'main' panicked at 'no entry found for key', /home/niklas/.cargo/registry/src/index.crates.io-6f17d22bba15001f/iai-0.1.1/src/lib.rs:162:40
stack backtrace:
   0: rust_begin_unwind
             at /rustc/8bdcc62cb0362869f0e7b43a6ae4f96b953d3cbc/library/std/src/panicking.rs:578:5
   1: core::panicking::panic_fmt
             at /rustc/8bdcc62cb0362869f0e7b43a6ae4f96b953d3cbc/library/core/src/panicking.rs:67:14
   2: core::panicking::panic_display
             at /rustc/8bdcc62cb0362869f0e7b43a6ae4f96b953d3cbc/library/core/src/panicking.rs:150:5
   3: core::panicking::panic_str
             at /rustc/8bdcc62cb0362869f0e7b43a6ae4f96b953d3cbc/library/core/src/panicking.rs:134:5
   4: core::option::expect_failed
             at /rustc/8bdcc62cb0362869f0e7b43a6ae4f96b953d3cbc/library/core/src/option.rs:1952:5
   5: iai::parse_cachegrind_output
   6: iai::run_bench
   7: iai::runner
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
error: bench failed, to rerun pass `--bench benches`

Works again after downgrading to valgrind 3.19.0.

[idea] Running this on MacOS / without valgrind dependency

I'm thinking of ways to run this on platforms where valgrind is not easily available. My current thinking is that iai could launch a Docker container with valgrind preinstalled and run its benchmarks there, then delete the container again. Docker (and the Linux VM that comes with Docker for Mac) is already "priced in" cost in my developer experience.

I don't think it makes a ton of sense to do all of Rust development in Docker though, after all compilation is still faster on M1, and most other things work.

Do not work with valgrind 3.21

I'm using fedora 38 and when I install valgrind i get the 3.21 version.
Iai seem to not work with it, if I run the example in the README I get:

thread 'main' panicked at 'no entry found for key', /home/user/.cargo/registry/src/github.com-1ecc6299db9ec823/iai-0.1.1/src/lib.rs:163:40

If I print events at line 160 of lib.rs I get: Events: {"Ir": 442824}

Downgrading to valgrind 3.18 solve the issue

Integrate with cargo-criterion

I don't want too much complexity in Iai itself, so it won't do things like HTML reporting. On the other hand, cargo-criterion already has most of that stuff, so it probably makes sense to have cargo-criterion connect to Iai benchmarks like it already does for Criterion-rs benchmarks and handle configuration and reporting that way.

Is this project still in development?

Hello, just wondering if this project is still active or in development?

It seemed really promising for repeatable benchmarks in CI so it'd be good to know if it's unlikely to be developed further.

bheisler / iai Goto Github PK

iai's People

Contributors

Stargazers

Watchers

Forkers

iai's Issues

Recommend Projects

Recommend Topics

Recommend Org