burntsushi / critcmp Goto Github PK

A command line tool for comparing benchmarks run by Criterion.

License: The Unlicense

Rust 100.00%

critcmp's Introduction

critcmp

A command line tool to for comparing benchmarks run by Criterion. This supports comparing benchmarks both across and inside baselines, where a "baseline" is a collection of benchmark data produced by Criterion for a single run.

Dual-licensed under MIT or the UNLICENSE.

Installation

Since this tool is primarily for use with the Criterion benchmark harness for Rust, you should install it with Cargo:

$ cargo install critcmp

critcmp's minimum supported Rust version is the current stable release.

WARNING: This tool explicitly reads undocumented internal data emitted by Criterion, which means this tool can break at any point if Criterion's internal data format changes.

critcmp is known to work with Criterion 0.3.3. This project will track the latest release of Criterion if breaking changes to Criterion's internal format occur, but will also attempt to keep working on older versions within reason.

Example

Usage

critcmp works by slurping up all benchmark data from Criterion's target directory, in addition to extra data supplied as positional parameters. The primary unit that critcmp works with is Criterion's baselines. That is, the simplest way to use critcmp is to save two baselines with Criterion's benchmark harness and then compare them. For example:

$ cargo bench -- --save-baseline before
$ cargo bench -- --save-baseline change
$ critcmp before change

Filtering can be done with the -f/--filter flag to limit comparisons based on a regex:

$ critcmp before change -f 'foo.*bar'

Comparisons with very small differences can also be filtered out. For example, this hides comparisons with differences of 5% or less

$ critcmp before change -t 5

Comparisons are not limited to only two baselines. Many can be used:

$ critcmp before change1 change2

The list of available baselines known to critcmp can be printed:

$ critcmp --baselines

A baseline can exported to one JSON file for more permanent storage outside of Criterion's target directory:

$ critcmp --export before > before.json
$ critcmp --export change > change.json

Baselines saved this way can be used by simply using their file path instead of just the name:

$ critcmp before.json change.json

Benchmarks within the same baseline can be compared as well. Normally, benchmarks are compared based on their name. That is, given two baselines, the correspondence between benchmarks is established by their name. Sometimes, however, you'll want to compare benchmarks that don't have the same name. This can be done by expressing the matching criteria via a regex. For example, given benchmarks 'optimized/input1' and 'naive/input1' in the baseline 'benches', the following will show a comparison between the two benchmarks despite the fact that they have different names:

$ critcmp benches -g '\w+/(input1)'

That is, the matching criteria is determined by the values matched by all of the capturing groups in the regex. All benchmarks with equivalent capturing groups will be included in one comparison. There is no limit on the number of benchmarks that can appear in a single comparison.

Finally, if comparisons grow too large to see in the default column oriented display, then the results can be flattened into lists:

$ critcmp before change1 change2 change3 change4 change5 --list

Motivation

This tool is similar to cargo-benchcmp, but it works on data gathered by Criterion.

In particular, Criterion emits loads of useful data, but its facilities for interactively comparing benchmarks and analyzing benchmarks in the aggregate are exceedingly limited. Criterion does provide the ability to save benchmark results as a "baseline," and this is primarily the data with which critcmp works with. In particular, while Criterion will show changes between a saved baseline and the current benchmark, there is no way to do further comparative analysis by looking at benchmark results in different views.

critcmp's People

Contributors

Stargazers

Watchers

Forkers

lu-zero tko sharksforarms plaflamme hhirtz denis2glez dovreshef atouchet zhiburt iwanabethatguy kerollmops justanotherdot virusdefender

critcmp's Issues

Incorrect alignment in case of big differences in nanoseconds

Hi @BurntSushi
Thanks for critcmp.

I just was going through the code and was wondering what will happen in case of big differences in nanoseconds.
And as you see bellow it gets broken a bit.

Is this something which should be addressed? (I mean at the end of the day the consistency of the data is still correct)

The example uses f64::MAX in one column.

group                                    base                                                                                                                                                                                                                                                                                                                                         new
-----                                    ----                                                                                                                                                                                                                                                                                                                                         ---
test_const_table/cli_table/1             1.00  179769313486231564793826746329356487460061771816610493194377291867566634083253736182911671780280864445928163680987122391750825462330354250895282439122322875506826024599142533926918074193061745122574500020189880363468340637347674643851875759782894318316386198487970256787451014597457079993094755057664.0±0.00s        ? ?/sec    1.00      2.3±0.04µs        ? ?/sec
test_const_table/cli_table/128           1.00      8.1±0.19ms        ? ?/sec                                                                                                                                                                                                                                                                                                          1.00      8.1±0.19ms        ? ?/sec
test_const_table/cli_table/32            1.00    596.6±5.45µs        ? ?/sec                                                                                                                                                                                                                                                                                                          1.00    596.6±5.45µs        ? ?/sec
test_const_table/cli_table/512           1.00    153.1±3.24ms        ? ?/sec                                                                                                                                                                                                                                                                                                          1.00    153.1±3.24ms        ? ?/sec
test_const_table/cli_table/8             1.00     47.8±0.81µs        ? ?/sec                                                                                                                                                                                                                                                                                                          1.00     47.8±0.81µs        ? ?/sec
test_const_table/comfy_table/1           1.00      5.0±0.15µs        ? ?/sec                                                                                                                                                                                                                                                                                                          1.00      5.0±0.15µs        ? ?/sec

Thanks.
Take care.

Offer markdown formatting

It would really help to place e.g. in PRs

benchmark.json: No such file or directory

Description
critcmpterminates with message:

/home/user/rust-prometheus/target/criterion/concurrent_observe_and_collect/new: benchmark.json: No such file or directory (os error 2)

Steps to reproduce

git clone https://github.com/tikv/rust-prometheus
cd https://github.com/tikv/rust-prometheus
$ cargo +stable bench --bench text_encoder -- --save-baseline baseline1
$ cargo +stable bench --bench text_encoder -- --save-baseline baseline2
critcmp baseline1 baseline2

Versions
criterion 0.3.3
critcmp 0.1.4 (installed today)

Inclusion in criterion.

Hi @BurntSushi,

I'd like for criterion to support this style of workflow but I'm not sure how to go about it. I see a few different options:

Write a specification for the baseline data format and let external tools parse it on their own.
Write a criterion-baseline library for loading and storing baselines.
Merge critcmp into criterion or cargo-criterion.

I'm leaning towards option 2. What are your thoughts?

Also, is there a better name for a collection of benchmark results than "baseline"?

Capitalization issues in estimates.json -- "mean" instead of "Mean"

Hi, I'm having a great time using critcmp and really appreciate the tool. I'm running into an issue setting it up on a new computer.

I'm on criterion 0.3.0 and critcmp 0.1.3. My criterion bench outputs the following JSON:

{"mean":{"confidence_interval":{"confidence_level":0.95,"lower_bound":98271422.4,"upper_bound":100928456.67},"point_estimate":99591700.17,"standard_error":688684.1673855595},"median":{"confidence_interval":{"confidence_level":0.95,"lower_bound":97666752.0,"upper_bound":101350337.0},"point_estimate":99418969.0,"standard_error":1107511.4378777498},"median_abs_dev":{"confidence_interval":{"confidence_level":0.95,"lower_bound":5462559.542620182,"upper_bound":8502608.17899852},"point_estimate":7297799.626538157,"standard_error":777802.4484694396},"slope":null,"std_dev":{"confidence_interval":{"confidence_level":0.95,"lower_bound":5741768.515991534,"upper_bound":7863115.64418664},"point_estimate":6824751.598088472,"standard_error":548632.199575996}}

Critcmp seems to be looking for values with capitalized letters though, and outputs this:

estimates.json: missing field `Mean` at line 1 column 753

If I then change mean to Mean in the above JSON, critcmp does this:

estimates.json: missing field `Median` at line 1 column 753

This continues until I've renamed all the necessary fields. Is there an option I'm missing in either Criterion or critcmp? I'm not sure what's causing this.

Thanks,
Alex

Internal file format changes

Hey, this is just a heads-up that the internal file formats used by Criterion.rs will be changing in the next release. Specifically, sample.json and estimates.json. Since they are internal files I don't consider this a breaking change, but I thought I'd let you know so you can update critcmp.

New sample.json format contains one SavedSample:

pub(crate) enum ActualSamplingMode {
    Linear,
    Flat,
}
pub(crate) struct SavedSample {
    sampling_mode: ActualSamplingMode,
    iters: Vec<f64>,
    times: Vec<f64>,
}

New estimates.json format contains one Estimates:

pub struct ConfidenceInterval {
    pub confidence_level: f64,
    pub lower_bound: f64,
    pub upper_bound: f64,
}
pub struct Estimate {
    pub confidence_interval: ConfidenceInterval,
    pub point_estimate: f64,
    pub standard_error: f64,
}
pub struct Estimates {
    pub mean: Estimate,
    pub median: Estimate,
    pub median_abs_dev: Estimate,
    pub slope: Option<Estimate>,
    pub std_dev: Estimate,
}

I should also mention that cargo-criterion has its own completely separate file structure using CBOR instead of JSON. It's getting pretty late here so I'll let you get the structure for that from the code.

output group order is different from cli

Currently, it seems that the output results from critcmp are sorted by time, with faster groups appearing first. However, this might not be consistent with the order specified in the CLI, which could be confusing. Perhaps there could be an option to follow the order from the CLI or place the fastest group first.

critcmp ~/Downloads/benchcmp2.json ~/Downloads/master.json
group                               benchcmp2_cecf96b9                     master
-----                               ------------------                     ------
bufvec/manual_chunk                 1.01      4.3±0.02µs        ? ?/sec    1.00      4.3±0.02µs        ? ?/sec

critcmp ~/Downloads/benchcmp2.json ~/Downloads/0.0.1.json
group                               0.0.1                                  benchcmp2_cecf96b9
-----                               -----                                  ------------------
bufvec/manual_chunk                 1.00      4.3±0.03µs        ? ?/sec    1.00      4.3±0.02µs        ? ?/sec

data.json.zip

Strip throughput rates if all are `?`

Thanks for critcmp. My benches don't tend to have throughput figures, I get two columns of ? B/sec.

It's not the worst thing having them, but maybe these could be auto-stripped away when they're all ??

Example:

group                                                 master                                 new
-----                                                 ------                                 ---
no_cache_render_3_medium_sections_fully               1.29  1647.1±18.17µs        ? B/sec    1.00  1277.4±17.63µs        ? B/sec
no_cache_render_v_bottom_1_large_section_partially    1.62      7.6±0.02ms        ? B/sec    1.00      4.7±0.02ms        ? B/sec
no_cache_render_v_center_1_large_section_partially    1.66      7.7±0.02ms        ? B/sec    1.00      4.7±0.04ms        ? B/sec

Throughput::Elements shown as "? B/sec"

I've a benchmark using Throughput::Elements but that's not accounted for in the output and instead is only displayed as "? B/sec" as if throughput wasn't used.

missing throughput causes a deserialization error

Currently critcmp from crates.io seems to have problems parsing the content of the currently released criterion.

/home/lu_zero/rav1e/target/criterion/get_sad/BLOCK_64X128/new/estimates.json: invalid type: null, expected a string at line 1 column 40

The file:

{"Mean":{"confidence_interval":{"confidence_level":0.95,"lower_bound":147.33967490751667,"upper_bound":147.43358882620052},"point_estimate":147.3849476164319,"standard_error":0.023965715727952808},"Median":{"confidence_interval":{"confidence_level":0.95,"lower_bound":147.30833994178107,"upper_bound":147.4066363541041},"point_estimate":147.34522830111064,"standard_error":0.025997038369799854},"MedianAbsDev":{"confidence_interval":{"confidence_level":0.95,"lower_bound":0.1509735509193759,"upper_bound":0.24146680313029364},"point_estimate":0.1924932858498119,"standard_error":0.024073251188416186},"Slope":{"confidence_interval":{"confidence_level":0.95,"lower_bound":147.31067985782533,"upper_bound":147.3925789529354},"point_estimate":147.35072958201718,"standard_error":0.020952924148138635},"StdDev":{"confidence_interval":{"confidence_level":0.95,"lower_bound":0.17501272137980164,"upper_bound":0.3100491684607925},"point_estimate":0.24054982719442242,"standard_error":0.035463059217927154}}