Code Monkey home page Code Monkey logo

hdrhistogram_rust's Introduction

HdrHistogram_rust

Crates.io Documentation Codecov Dependency status

HdrSample is a port of Gil Tene's HdrHistogram to native Rust. It provides recording and analyzing of sampled data value counts across a large, configurable value range with configurable precision within the range. The resulting "HDR" histogram allows for fast and accurate analysis of the extreme ranges of data with non-normal distributions, like latency.

HdrHistogram

What follows is a description from the HdrHistogram website. Users are encouraged to read the documentation from the original Java implementation, as most of the concepts translate directly to the Rust port.

HdrHistogram supports the recording and analyzing of sampled data value counts across a configurable integer value range with configurable value precision within the range. Value precision is expressed as the number of significant digits in the value recording, and provides control over value quantization behavior across the value range and the subsequent value resolution at any given level.

For example, a Histogram could be configured to track the counts of observed integer values between 0 and 3,600,000,000 while maintaining a value precision of 3 significant digits across that range. Value quantization within the range will thus be no larger than 1/1,000th (or 0.1%) of any value. This example Histogram could be used to track and analyze the counts of observed response times ranging between 1 microsecond and 1 hour in magnitude, while maintaining a value resolution of 1 microsecond up to 1 millisecond, a resolution of 1 millisecond (or better) up to one second, and a resolution of 1 second (or better) up to 1,000 seconds. At it's maximum tracked value (1 hour), it would still maintain a resolution of 3.6 seconds (or better).

HDR Histogram is designed for recording histograms of value measurements in latency and performance sensitive applications. Measurements show value recording times as low as 3-6 nanoseconds on modern (circa 2014) Intel CPUs. The HDR Histogram maintains a fixed cost in both space and time. A Histogram's memory footprint is constant, with no allocation operations involved in recording data values or in iterating through them. The memory footprint is fixed regardless of the number of data value samples recorded, and depends solely on the dynamic range and precision chosen. The amount of work involved in recording a sample is constant, and directly computes storage index locations such that no iteration or searching is ever involved in recording data values.

If you are looking for FFI bindings to HdrHistogram_c, you want the hdrhistogram_c crate instead.

Interacting with the library

HdrSample's API follows that of the original HdrHistogram Java implementation, with some modifications to make its use more idiomatic in Rust. The description in this section has been adapted from that given by the Python port, as it gives a nicer first-time introduction to the use of HdrHistogram than the Java docs do.

HdrSample is generally used in one of two modes: recording samples, or querying for analytics. In distributed deployments, the recording may be performed remotely (and possibly in multiple locations), to then be aggregated later in a central location for analysis.

Recording samples

A histogram instance is created using the ::new methods on the Histogram struct. These come in three variants: new, new_with_max, and new_with_bounds. The first of these only sets the required precision of the sampled data, but leaves the value range open such that any value may be recorded. A Histogram created this way (or one where auto-resize has been explicitly enabled) will automatically resize itself if a value that is too large to fit in the current dataset is encountered. new_with_max sets an upper bound on the values to be recorded, and disables auto-resizing, thus preventing any re-allocation during recording. If the application attempts to record a larger value than this maximum bound, the record call will return an error. Finally, new_with_bounds restricts the lowest representable value of the dataset, such that a smaller range needs to be covered (thus reducing the overall allocation size).

For example the example below shows how to create a Histogram that can count values in the [1..3600000] range with 1% precision, which could be used to track latencies in the range [1 msec..1 hour]).

use hdrhistogram::Histogram;
let mut hist = Histogram::<u64>::new_with_bounds(1, 60 * 60 * 1000, 2).unwrap();

// samples can be recorded using .record, which will error if the value is too small or large
hist.record(54321).expect("value 54321 should be in range");

// for ergonomics, samples can also be recorded with +=
// this call will panic if the value is out of range!
hist += 54321;

// if the code that generates the values is subject to Coordinated Omission,
// the self-correcting record method should be used instead.
// for example, if the expected sampling interval is 10 msec:
hist.record_correct(54321, 10).expect("value 54321 should be in range");

Note the u64 type. This type can be changed to reduce the storage overhead for all the histogram bins, at the cost of a risk of saturating if a large number of samples end up in the same bin.

Querying samples

At any time, the histogram can be queried to return interesting statistical measurements, such as the total number of recorded samples, or the value at a given quantile:

use hdrhistogram::Histogram;
let hist = Histogram::<u64>::new(2).unwrap();
// ...
println!("# of samples: {}", hist.len());
println!("99.9'th percentile: {}", hist.value_at_quantile(0.999));

Several useful iterators are also provided for quickly getting an overview of the dataset. The simplest one is iter_recorded(), which yields one item for every non-empty sample bin. All the HdrHistogram iterators are supported in HdrSample, so look for the *Iterator classes in the Java documentation.

use hdrhistogram::Histogram;
let hist = Histogram::<u64>::new(2).unwrap();
// ...
for v in hist.iter_recorded() {
    println!("{}'th percentile of data is {} with {} samples",
        v.percentile(), v.value_iterated_to(), v.count_at_value());
}

Panics and error handling

As long as you're using safe, non-panicking functions (see below), this library should never panic. Any panics you encounter are a bug; please file them in the issue tracker.

A few functions have their functionality exposed via AddAssign and SubAssign implementations. These alternate forms are equivalent to simply calling unwrap() on the normal functions, so the normal rules of unwrap() apply: view with suspicion when used in production code, etc.

Returns Result Panics on error Functionality
h.record(v) h += v Increment count for value v
h.add(h2) h += h2 Add h2's counts to h
h.subtract(h2) h -= h2 Subtract h2's counts from h

Other than the panicking forms of the above functions, everything will return Result or Option if it can fail.

usize limitations

Depending on the configured number of significant digits and maximum value, a histogram's internal storage may have hundreds of thousands of cells. Systems with a 16-bit usize cannot represent pointer offsets that large, so relevant operations (creation, deserialization, etc) will fail with a suitable error (e.g. CreationError::UsizeTypeTooSmall). If you are using such a system and hitting these errors, reducing the number of significant digits will greatly reduce memory consumption (and therefore the need for large usize values). Lowering the max value may also help as long as resizing is disabled.

32- and above systems will not have any such issues, as all possible histograms fit within a 32-bit index.

Floating point accuracy

Some calculations inherently involve floating point values, like value_at_quantile, and are therefore subject to the precision limits of IEEE754 floating point calculations. The user- visible consequence of this is that in certain corner cases, you might end up with a bucket (and therefore value) that is higher or lower than it would be if the calculation had been done with arbitrary-precision arithmetic. However, double-precision IEEE754 (i.e. f64) is very good at its job, so these cases should be rare. Also, we haven't seen a case that was off by more than one bucket.

To minimize FP precision losses, we favor working with quantiles rather than percentiles. A quantile represents a portion of a set with a number in [0, 1]. A percentile is the same concept, except it uses the range [0, 100]. Working just with quantiles means we can skip an FP operation in a few places, and therefore avoid opportunities for precision loss to creep in.

Limitations and Caveats

As with all the other HdrHistogram ports, the latest features and bug fixes from the upstream HdrHistogram implementations may not be available in this port. A number of features have also not (yet) been implemented:

  • Concurrency support (AtomicHistogram, ConcurrentHistogram, …).
  • DoubleHistogram.
  • The Recorder feature of HdrHistogram.
  • Value shifting ("normalization").
  • Textual output methods. These seem almost orthogonal to HdrSample, though it might be convenient if we implemented some relevant traits (CSV, JSON, and possibly simple fmt::Display).

Most of these should be fairly straightforward to add, as the code aligns pretty well with the original Java/C# code. If you do decide to implement one and send a PR, please make sure you also port the test cases, and try to make sure you implement appropriate traits to make the use of the feature as ergonomic as possible.

Usage

Add this to your Cargo.toml:

[dependencies]
hdrhistogram = "7"

and this to your crate root:

extern crate hdrhistogram;

License

Dual-licensed to be compatible with the Rust project.

Licensed under the Apache License, Version 2.0 http://www.apache.org/licenses/LICENSE-2.0 or the MIT license http://opensource.org/licenses/MIT, at your option. This file may not be copied, modified, or distributed except according to those terms.

hdrhistogram_rust's People

Contributors

anderender avatar arthurprs avatar carreau avatar danburkert avatar davidpdrsn avatar dependabot[bot] avatar geal avatar hdevalence avatar jonhoo avatar marshallpierce avatar payasr avatar petersp avatar serprex avatar soro avatar the8472 avatar tspiteri avatar tudyx avatar wasabi375 avatar wcassels avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hdrhistogram_rust's Issues

Publish 7.0.0 to crates.io

Could you publish version 7.0.0 to crates.io? (I'm using the new Error types in 7.0.0.) Thanks for your work on this library!

Consider allowing serialization into a Vec<u8> directly

Currently, serialization output is directed to a Write in both the V2 and V2 + DEFLATE serializers. This is a very flexible abstraction, but the serialization formats dictate a length prefix, which means that output must be buffered before writing to the writer so that we can drop in the final length in the appropriate spot.

This buffering is especially sad in the DEFLATE case, since we already have a Vec<u8> that will hold the uncompressed serialization, but we can only expose it to the underlying V2 serializer as a Write, so the uncompressed bytes get written to V2's buffer, then copied into V2 + DEFLATE's buffer, then compressed into another buffer, then written to the user-provided writer.

Some options:

  • Functions for Write or Vec<u8>, where the former buffers into an internal Vec<u8>
  • Functions for Write or Write + Seek. Need to benchmark this to see if it's measurably different from the Vec<u8> case. If it isn't, this seems preferable as it is more general.
  • Give up on Write and only serialize into either Write + Seek or a Vec<u8>. Maybe supporting I/O streams that don't seek is a sufficiently niche case that it's not worth creating an easy path for? On one hand, I really want to encourage people to use these formats as a wire protocol in a monitoring system. On the other hand, maybe such a protocol would use a container format like protobuf around the serialized histogram anyway, and it's a waste of API complexity budget to support simple Write usage.

Move to nom 5.0

We're currently on 4.2.3, whereas upstream has been bumped to 5.0. While I think the changes are mainly cosmetic/mechanical between the two, it still requires some work.

No way to append to an interval log?

Hey, great work on this library - enjoying working in it. I was wondering, was it intentional to design the api such that you can't append to interval logs? I had assumed that's what I'd be doing but realized eventually that would not work. Even if there was a stand-alone function or impl to serialize one Histogram into a Writer in the interval format, I think that would allow a lot of flexibility. Thanks!

Test failures on Debian i386

I recently updated the hdrhistogram package in Debian, and as part of that I resolved the issues that were blocking the tests from running on Debian's test infrastructure.

Unfortunately when running the CI tests on i386 two tests failed, I can also reproduce this locally.

failures:

---- iter_quantiles_saturated_count_before_max_value stdout ----
thread 'iter_quantiles_saturated_count_before_max_value' panicked at 'capacity overflow', library/alloc/src/raw_vec.rs:518:5
stack backtrace:
   0: rust_begin_unwind
             at /usr/src/rustc-1.59.0/library/std/src/panicking.rs:498:5
   1: core::panicking::panic_fmt
             at /usr/src/rustc-1.59.0/library/core/src/panicking.rs:116:14
   2: core::panicking::panic
             at /usr/src/rustc-1.59.0/library/core/src/panicking.rs:48:5
   3: alloc::raw_vec::capacity_overflow
             at /usr/src/rustc-1.59.0/library/alloc/src/raw_vec.rs:518:5
   4: alloc::raw_vec::handle_reserve
             at /usr/src/rustc-1.59.0/library/alloc/src/raw_vec.rs:489:34
   5: alloc::raw_vec::RawVec<T,A>::reserve::do_reserve_and_handle
             at /usr/src/rustc-1.59.0/library/alloc/src/raw_vec.rs:287:13
   6: alloc::raw_vec::RawVec<T,A>::reserve
             at /usr/src/rustc-1.59.0/library/alloc/src/raw_vec.rs:291:13
   7: alloc::vec::Vec<T,A>::reserve
             at /usr/src/rustc-1.59.0/library/alloc/src/vec/mod.rs:809:9
   8: alloc::vec::Vec<T,A>::extend_desugared
             at /usr/src/rustc-1.59.0/library/alloc/src/vec/mod.rs:2642:17
   9: <alloc::vec::Vec<T,A> as alloc::vec::spec_extend::SpecExtend<T,I>>::spec_extend
             at /usr/src/rustc-1.59.0/library/alloc/src/vec/spec_extend.rs:18:9
  10: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
             at /usr/src/rustc-1.59.0/library/alloc/src/vec/spec_from_iter_nested.rs:37:9
  11: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
             at /usr/src/rustc-1.59.0/library/alloc/src/vec/spec_from_iter.rs:33:9
  12: <alloc::vec::Vec<T> as core::iter::traits::collect::FromIterator<T>>::from_iter
             at /usr/src/rustc-1.59.0/library/alloc/src/vec/mod.rs:2541:9
  13: core::iter::traits::iterator::Iterator::collect
             at /usr/src/rustc-1.59.0/library/core/src/iter/traits/iterator.rs:1745:9
  14: iterators::iter_quantiles_saturated_count_before_max_value
             at ./tests/iterators.rs:569:55
  15: iterators::iter_quantiles_saturated_count_before_max_value::{{closure}}
             at ./tests/iterators.rs:561:1
  16: core::ops::function::FnOnce::call_once
             at /usr/src/rustc-1.59.0/library/core/src/ops/function.rs:227:5
  17: core::ops::function::FnOnce::call_once
             at /usr/src/rustc-1.59.0/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

---- iter_quantiles_iterates_to_quantile_10_as_it_reaches_last_bucket stdout ----
thread 'iter_quantiles_iterates_to_quantile_10_as_it_reaches_last_bucket' panicked at 'capacity overflow', library/alloc/src/raw_vec.rs:518:5
stack backtrace:
   0: rust_begin_unwind
             at /usr/src/rustc-1.59.0/library/std/src/panicking.rs:498:5
   1: core::panicking::panic_fmt
             at /usr/src/rustc-1.59.0/library/core/src/panicking.rs:116:14
   2: core::panicking::panic
             at /usr/src/rustc-1.59.0/library/core/src/panicking.rs:48:5
   3: alloc::raw_vec::capacity_overflow
             at /usr/src/rustc-1.59.0/library/alloc/src/raw_vec.rs:518:5
   4: alloc::raw_vec::handle_reserve
             at /usr/src/rustc-1.59.0/library/alloc/src/raw_vec.rs:489:34
   5: alloc::raw_vec::RawVec<T,A>::reserve::do_reserve_and_handle
             at /usr/src/rustc-1.59.0/library/alloc/src/raw_vec.rs:287:13
   6: alloc::raw_vec::RawVec<T,A>::reserve
             at /usr/src/rustc-1.59.0/library/alloc/src/raw_vec.rs:291:13
   7: alloc::vec::Vec<T,A>::reserve
             at /usr/src/rustc-1.59.0/library/alloc/src/vec/mod.rs:809:9
   8: alloc::vec::Vec<T,A>::extend_desugared
             at /usr/src/rustc-1.59.0/library/alloc/src/vec/mod.rs:2642:17
   9: <alloc::vec::Vec<T,A> as alloc::vec::spec_extend::SpecExtend<T,I>>::spec_extend
             at /usr/src/rustc-1.59.0/library/alloc/src/vec/spec_extend.rs:18:9
  10: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
             at /usr/src/rustc-1.59.0/library/alloc/src/vec/spec_from_iter_nested.rs:37:9
  11: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
             at /usr/src/rustc-1.59.0/library/alloc/src/vec/spec_from_iter.rs:33:9
  12: <alloc::vec::Vec<T> as core::iter::traits::collect::FromIterator<T>>::from_iter
             at /usr/src/rustc-1.59.0/library/alloc/src/vec/mod.rs:2541:9
  13: core::iter::traits::iterator::Iterator::collect
             at /usr/src/rustc-1.59.0/library/core/src/iter/traits/iterator.rs:1745:9
  14: iterators::iter_quantiles_iterates_to_quantile_10_as_it_reaches_last_bucket
             at ./tests/iterators.rs:717:55
  15: iterators::iter_quantiles_iterates_to_quantile_10_as_it_reaches_last_bucket::{{closure}}
             at ./tests/iterators.rs:703:1
  16: core::ops::function::FnOnce::call_once
             at /usr/src/rustc-1.59.0/library/core/src/ops/function.rs:227:5
  17: core::ops::function::FnOnce::call_once
             at /usr/src/rustc-1.59.0/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.


failures:
    iter_quantiles_iterates_to_quantile_10_as_it_reaches_last_bucket
    iter_quantiles_saturated_count_before_max_value

test result: FAILED. 18 passed; 2 failed; 0 ignored; 0 measured; 0 filtered out; finished in 39.15s

Debian i386 uses the x87 FPU and I strongly suspect it's quirky floating point is involved in these failures, but I have no idea how to debug this further.

Strange behaviour of percentile iterator

I am trying to create hgrm text output. For this I am using this function: https://gist.github.com/algermissen/9f86fe6051a5e4f89ef92d8f5cfd9637 which aims to be a port of https://github.com/HdrHistogram/HdrHistogram_py/blob/5b9b94a1827a88c4782100dff52dfc7e2578ee87/hdrh/histogram.py#L576

The histogram is created in Rust code with let hist = Histogram::<u32>::new_with_bounds(1, 60 * 60 * 1000, 2), exported as V2+DEFLATE and turned to base64.

The hgrm outputs below are 1. Generated from the Rust histogram using the function in the gist and 2. created by pasting the base64 into https://hdrhistogram.github.io/HdrHistogramJSDemo/decoding-demo.html

What strikes me as odd (besides that they are not the same) is that the Rust version seems to be 'one-off' in some sense. It is as if the last iteration should not be there (the percentile 1.0 appears twice. In the gist I check if v.quantile() < 1.0 and the iteration value seems to indeed return 1.0 twice.

      951.00 1.000000000000         10
      951.00 1.000000000000         10

Is this an indicator of a bug or am I doing something very wrong here?

I also wonder why there is not better precision for the percentile column shown in the Rust output - is this my formatting or some unwanted rounding going on (note: I am still new to Rust, so might entirely be me, being simply stupid)

Rust-direct-output with Gist-Function:

       Value     Percentile TotalCount 1/(1-Percentile)

       79.00 0.100000000000          1           1.11
       79.00 0.100000000000          1           1.11
       85.00 0.300000000000          3           1.43
       89.00 0.400000000000          4           1.67
       89.00 0.400000000000          4           1.67
       98.00 0.500000000000          5           2.00
      115.00 0.600000000000          6           2.50
      133.00 0.700000000000          7           3.33
      133.00 0.700000000000          7           3.33
      212.00 0.800000000000          8           5.00
      212.00 0.800000000000          8           5.00
      212.00 0.800000000000          8           5.00
      373.00 0.900000000000          9          10.00
      373.00 0.900000000000          9          10.00
      373.00 0.900000000000          9          10.00
      373.00 0.900000000000          9          10.00
      373.00 0.900000000000          9          10.00
      951.00 1.000000000000         10
      951.00 1.000000000000         10
#[Mean    =       221.90, StdDeviation   =       257.55]
#[Max      =          951, TotalCount   =           10]
#[Buckets      =           15, SubBuckets   =         2048]

Converted from base64 using JS-demo page:

       Value     Percentile TotalCount 1/(1-Percentile)

       79.00 0.000000000000          1           1.00
       79.00 0.100000000000          1           1.11
       85.00 0.200000000000          3           1.25
       85.00 0.300000000000          3           1.43
       89.00 0.400000000000          4           1.67
       98.00 0.500000000000          5           2.00
      115.00 0.550000000000          6           2.22
      115.00 0.600000000000          6           2.50
      133.00 0.650000000000          7           2.86
      133.00 0.700000000000          7           3.33
      212.00 0.750000000000          8           4.00
      212.00 0.775000000000          8           4.44
      212.00 0.800000000000          8           5.00
      373.00 0.825000000000          9           5.71
      373.00 0.850000000000          9           6.67
      373.00 0.875000000000          9           8.00
      373.00 0.887500000000          9           8.89
      373.00 0.900000000000          9          10.00
      951.00 0.912500000000         10          11.43
      951.00 1.000000000000         10
#[Mean    =       221.90, StdDeviation   =       257.55]
#[Max     =       951.00, Total count    =           10]
#[Buckets =           15, SubBuckets     =          256]

Make public deserialize methods `deser_v2` & `deser_v2_compressed`

Specifically I'm interested in calling the deser_v2 method directly. My motivation is I'm using hdrhistogram in a WASM module where I only need to deserialize uncompressed histograms. By using the deserialize method the compiler can not statically determine that I will not be deserializing compressed histograms and flate2 gets included in my WASM.

I only included deser_v2_compressed in the title because I figured if you make public one of them then perhaps it makes sense to do both(??).

Error when iterating over histogram with only zeros

If a histogram recorded only zeros, then the iter_recorded returns an empty iterator.

Reproduction:

use hdrhistogram::Histogram;
fn main() {
    let mut h = Histogram::<u64>::new(1).unwrap();
    h += 0;
    // Uncomment and suddenly the iterator returns 2 elements
    // h += 1;
    println!("Histogram: {h:?}");
    println!("Zeros: {}", h.count_at(0));
    // I expect that to return one element - 0 with count 1, or two if uncommented above
    for d in h.iter_recorded() {
        println!("{d:?}");
    }
}

It looks like the problem is with the iterator, as count_at returns the right number.

Criterion data in repo

Is it right that .criterion is checked in? It seems like it's using that as a baseline to compare against or something, which isn't applicable on my hardware. Also, presumably if you were benchmarking against a baseline, wouldn't you want to set the baseline on a particular revision, then apply the changes under consideration?

iter_recorded() method fails to iterate if all the values are zero.

iter_recorded() won't work when all the values recorded by the Histogram instance is 0.
Sample code to hit the issue:

let mut tracker = Histogram::<u64>::new(3).unwrap();
tracker += 0;
tracker += 0;
tracker += 0;

 // print 3 as expected.
 println!("{}", tracker.len());

 let mut res = String::new();
 for v in tracker.iter_recorded() {
     res.push_str(format!("value {} and count {}", v.value_iterated_to(), v.count_at_value()).as_str());
 }

 // nothing is printed. Expected: value 0 and count 3
 println!("{}", res);

We use the iterator to print out a certain format and emit to our log file. Is there a better way to do that for the logging purpose?

Express auto resize functionality as a type?

While working on #10 I'm ending up with error options that can only occur if resize is disabled. This is unfortunate as it means that someone using resize will now have to handle error variants that won't ever occur at runtime.

The way that auto resize interacts with the contracts of the methods is a little regrettable to begin with. I wonder if there's a way we could reflect this difference in the type system rather than with comments here and there describing what will or won't happen when auto resize is enabled. Maybe a trait with associated types for errors and implementations for both auto resize and plain histograms? Almost all the code could be re-used between the two, I think, with just some different error enums to give each style a very precise error mapping.

Pure rust compression

flate2 is fast, but it would be nice to have a default pure rust compression library so we wouldn't need to have the serialization feature stuff.

Presumably this would go along with allowing the user to specify which compression option to use, perhaps via generifying on the compressor.

Support deserializing into existing histogram

The current deserialization example is a little sad, in that it needs to allocate a new histogram for each of the intermediate deserialized histograms. It would be awesome if there was also a way to deserialize directly into an existing histogram (essentially deserialize + add without an intermediate collection). I don't know if this is doable given the serialization format though..

f32 or i32 support

First, thanks for your contribution. Your crate is really well documented and easy to use. However, I am not able to use it because of supporting only u64.

It would be nice if there is a support for negative numbers (i32) and floating point number (f32). For f32, I can just multiply with scaling factors to use as i32 but supporting negative number is a really needed feature for me.

Clarify tracking of highest and lowest trackable values

See #74 (comment) for more context.

We save in fields (and write when serializing) the requested limits, not the actual limits that result from the underlying encoding (which will encompass at least as much as what the user requested, and maybe more). Perhaps we should expose the actual limits of what a particular histogram can do, rather than just regurgitate the limits that the user requested? This would be useful when, say, storing metadata about histograms, since the data actually in the histogram is likely more interesting than the particular subset of values that were initially requested as trackable.

Strawman:

  • configured_low() for what the user requested when creating the histogram
  • actual_low() for what the histogram can support
  • configured_high(), actual_high()

u128 support

Should we move things like the total number of counts to u128? That may have a detrimental affect on performance, especially on lesser hardware; I haven't benchmarked it yet to see.

Add method for clamped recording

In some benchmarking scenarios, I am running into extreme outliers, often due to a bug in the system or due to hitting a scaling wall. When these happen, my benchmark binary crashes, since I record using something like:

let us = elapsed.as_secs() * 1_000_000 + elapsed.subsec_nanos() as u64 / 1_000;
hist.record(us).unwrap();

Operations generally take ~10µs, but the outliers can easily grow to tens of seconds. So even with liberal bounds, the record will fail. When I hit these outliers, I don't actually care about how much of an outlier they are, since I don't expect any "regular" samples beyond, say, ~1ms. In this case, what I really want is the ability to record values clamped to the range of the histogram (which should then never fail). I end up writing code like:

if hist.record(us).is_err() {
   let m = hist.high();
    hist.record(m).unwrap();
}

It'd be really nice if I could instead write:

hist.record_clamped(us);

@marshallpierce thoughts?

Restrict counts to only u8-u64

We don't seem to be gaining much by allowing, say, f64, and it leads to some questionable things like:

total_count = total_count + count.to_u64().unwrap();

Meaningfully handling the case where that doesn't work seems iffy.

`fmt::Display` trait for errors

Hi and thanks for a great library!

As the topic states, could it be possible to implement Display or even Error trait(s) for the crate's error types? I believe it will improve user experience and generally make it easier to incorporate those errors into the end-user's error type system :)

Thanks!

Feature request: support floating-point numbers

Feature request: support floating-point numbers

See #91,

Floating point values can be supported, but they are basically a completely separate implementation that wraps an inner integer-based histogram. I'd be neat to port DoubleHistogram to Rust as well, but unfortunately it's a larger effort that I don't have time for myself at the moment. If you want to give it a try, I'd be happy to try and review? It should be a fairly straightforward translation of the Java version I linked above! I wonder if we may even be able to have a single Histogram type by providing inherent implementations for Histogram<f64>... Not entirely clear. Initially I'd just have a separate DoubleHistogram type.

If possible, I suggest implementing a type-generic histogram using num-traits.

[Breaking change] rename `more()` to something more explicit.

See #124 and #126, the name more(), is confusing as it indicate that we want to keep iterating over buckets/records whatever even if they all will be empty. Maybe rename to something like more_with_zero_count, or more_empty ? Though this does not convey that all the subsequent one will be empty.

index_for uses isize, and may panic

It seems like we should be able to guarantee that those indexes are positive, which should hopefully also make us confident that it won't panic.

SyncHistogram::refresh() freezes

hdrhistogram = { version = "7.5.2", default-features = false, features = ["sync"] }

I'm getting recorders within a tokio::spawn closure:

time_per_one_recorder
    .get_or(|| {
        let time_per_one = time_per_one.lock().expect("time_per_one lock on updater");
        std::cell::RefCell::from(time_per_one.recorder())
    })
    .borrow_mut()
    .record(duration_ms)
    .expect("record to histogram");

When all tasks (1 million of them, i.e. that many record() calls) were processed and there's nothing else to record, I'm doing a refresh:

let mut time_per_one = time_per_one.lock().unwrap();
println!("=> Merging histograms...");
time_per_one.refresh();
println!("=> Merging done");

My program hangs forever on the refresh(), because the merging message is the last thing I'm seeing on stdout. According to top, my process doesn't do anything. I never tried debugging (as in gdb) for Rust programs so can't tell more at this point, will try it later.

By the way, everything used to be working fine whenever I called refresh() roughly once a second during the task processing. The freeze started happening when I decided one big merge when everything's done is better because it stops blocking recorders amid task processing.

Is this a crate bug, or maybe I'm using the crate wrong?

Thanks.

Merge multiple Histograms into 1.

We are going to use this in mqttwrk where in we launch a bunch of clients. We plan to have a Histogram instance for each connection to measure metrics. After that we would like to merge these histograms to get an aggregation of all histograms.

I was looking at Python docs and found something similar. which supports merge of multiple histograms. Our docs dont mention this explicitly. Do we support this feature?

If not, would you be open to PR where we serialise multiple Histograms and send them over a channel and deserialise each of them and merge. Like a fan in? Would take a bit time as I would have to familiarise myself with the code-base.

Audit `usize` usage for 16-bit safety

Inspired by the looming feasibility of Rust on AVR, in the serialization code I've started to use checked pointer math to make sure we don't overflow usize. There are other places where I'm pretty sure we assume that usize is at least 32 bits, so we should find those and add appropriate checking.

Feature Request: Adding Sum to histograms

I have a use case where I would like to get an (approximate) sum of all the values in a histogram. Specifically I am looking at tracking the latency of calls in a hdrhistogram and then doing something like histogram.sum(). Although it looks like this can be achieved through iter_recorded I was wondering if there might be interest in having a dedicated method for this? (Happy to pick this up myself if there's interest!)

Apply rustfmt everywhere

The default rustfmt config is maybe a little "fluffy" for my taste in its application of newlines but I really don't care that much. Maybe not even enough to make a rustfmt.toml. ;)

Add histogram log format support

Generally useful, but also when we develop a corpus of sample serialized histograms with pre-calculated metadata to test an implementation against, it would be good to do the same for the log format -- which means we need the log format first.

Deprecate non-deflate serialization

As far as I can tell, the V2 log format in the Java implementation does not support not compressing the histograms. Should we remove support for that? Should we at least update the examples to prefer the more "standard" serializer that will interoperate with other hdrhistogram libraries?

cc @marshallpierce

Allow customizing handling of out-of-range counts when serializing

We allow u64 counts, but the serialization format only allows i64::max_value() as the largest count. So, you can end up with an un-serializable histogram.

It would be nice if the user could choose what to do in such a situation. Right now we always error, which means I needed to do things like https://github.com/jonhoo/hdrsample/pull/40/files#diff-c761d465047faa0efb455f492b2f1e07R528 when testing serialization with random counts.

Options that come to mind:

  • Error (we've got this one now)
  • Squish oversized counts down to i64::max_value()
  • ???

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.