Code Monkey home page Code Monkey logo

hdrhistogram_rust's Introduction

HdrHistogram

Gitter Java CI Javadocs

HdrHistogram: A High Dynamic Range (HDR) Histogram

This repository currently includes a Java implementation of HdrHistogram. C, C#/.NET, Python, Javascript, Rust, Erlang, and Go ports can be found in other repositories. All of which share common concepts and data representation capabilities. Look at repositories under the HdrHistogram organization for various implementations and useful tools.

Note: The below is an excerpt from a Histogram JavaDoc. While much of it generally applies to other language implementations as well, some details may vary by implementation (e.g. iteration and synchronization), so you should consult the documentation or header information of the specific API library you intend to use.


HdrHistogram supports the recording and analyzing of sampled data value counts across a configurable integer value range with configurable value precision within the range. Value precision is expressed as the number of significant digits in the value recording, and provides control over value quantization behavior across the value range and the subsequent value resolution at any given level.

For example, a Histogram could be configured to track the counts of observed integer values between 0 and 3,600,000,000 while maintaining a value precision of 3 significant digits across that range. Value quantization within the range will thus be no larger than 1/1,000th (or 0.1%) of any value. This example Histogram could be used to track and analyze the counts of observed response times ranging between 1 microsecond and 1 hour in magnitude, while maintaining a value resolution of 1 microsecond up to 1 millisecond, a resolution of 1 millisecond (or better) up to one second, and a resolution of 1 second (or better) up to 1,000 seconds. At its maximum tracked value (1 hour), it would still maintain a resolution of 3.6 seconds (or better).

The HdrHistogram package includes the Histogram implementation, which tracks value counts in long fields, and is expected to be the commonly used Histogram form. IntHistogram and ShortHistogram, which track value counts in int and short fields respectively, are provided for use cases where smaller count ranges are practical and smaller overall storage is beneficial.

HdrHistogram is designed for recording histograms of value measurements in latency and performance sensitive applications. Measurements show value recording times as low as 3-6 nanoseconds on modern (circa 2012) Intel CPUs. AbstractHistogram maintains a fixed cost in both space and time. A Histogram's memory footprint is constant, with no allocation operations involved in recording data values or in iterating through them. The memory footprint is fixed regardless of the number of data value samples recorded, and depends solely on the dynamic range and precision chosen. The amount of work involved in recording a sample is constant, and directly computes storage index locations such that no iteration or searching is ever involved in recording data values.

A combination of high dynamic range and precision is useful for collection and accurate post-recording analysis of sampled value data distribution in various forms. Whether it's calculating or plotting arbitrary percentiles, iterating through and summarizing values in various ways, or deriving mean and standard deviation values, the fact that the recorded data information is kept in high resolution allows for accurate post-recording analysis with low [and ultimately configurable] loss in accuracy when compared to performing the same analysis directly on the potentially infinite series of sourced data values samples.

A common use example of HdrHistogram would be to record response times in units of microseconds across a dynamic range stretching from 1 usec to over an hour, with a good enough resolution to support later performing post-recording analysis on the collected data. Analysis can include computing, examining, and reporting of distribution by percentiles, linear or logarithmic value buckets, mean and standard deviation, or by any other means that can be easily added by using the various iteration techniques supported by the Histogram. In order to facilitate the accuracy needed for various post-recording analysis techniques, this example can maintain a resolution of ~1 usec or better for times ranging to ~2 msec in magnitude, while at the same time maintaining a resolution of ~1 msec or better for times ranging to ~2 sec, and a resolution of ~1 second or better for values up to 2,000 seconds. This sort of example resolution can be thought of as "always accurate to 3 decimal points." Such an example Histogram would simply be created with a highestTrackableValue of 3,600,000,000, and a numberOfSignificantValueDigits of 3, and would occupy a fixed, unchanging memory footprint of around 185KB (see "Footprint estimation" below).

Histogram variants and internal representation

The HdrHistogram package includes multiple implementations of the AbstractHistogram class:

  • Histogram, which is the commonly used Histogram form and tracks value counts in long fields.
  • IntHistogram and ShortHistogram, which track value counts in int and short fields respectively, are provided for use cases where smaller count ranges are practical and smaller overall storage is beneficial (e.g. systems where tens of thousands of in-memory histogram are being tracked).
  • AtomicHistogram and SynchronizedHistogram (see 'Synchronization and concurrent access' below)

Internally, data in HdrHistogram variants is maintained using a concept somewhat similar to that of floating point number representation: Using an exponent a (non-normalized) mantissa to support a wide dynamic range at a high but varying (by exponent value) resolution. AbstractHistogram uses exponentially increasing bucket value ranges (the parallel of the exponent portion of a floating point number) with each bucket containing a fixed number (per bucket) set of linear sub-buckets (the parallel of a non-normalized mantissa portion of a floating point number). Both dynamic range and resolution are configurable, with highestTrackableValue controlling dynamic range, and numberOfSignificantValueDigits controlling resolution.

Synchronization and concurrent access

In the interest of keeping value recording cost to a minimum, the commonly used Histogram class and it's IntHistogram and ShortHistogram variants are NOT internally synchronized, and do NOT use atomic variables. Callers wishing to make potentially concurrent, multi-threaded updates or queries against Histogram objects should either take care to externally synchronize and/or order their access, or use the ConcurrentHistogram, AtomicHistogram, or SynchronizedHistogram or variants.

A common pattern seen in histogram value recording involves recording values in a critical path (multi-threaded or not), coupled with a non-critical path reading the recorded data for summary/reporting purposes. When such continuous non-blocking recording operation (concurrent or not) is desired even when sampling, analyzing, or reporting operations are needed, consider using the Recorder and SingleWriterRecorder recorder variants that were specifically designed for that purpose. Recorders provide a recording API similar to Histogram, and internally maintain and coordinate active/inactive histograms such that recording remains wait-free in the presence of accurate and stable interval sampling.

It is worth mentioning that since Histogram objects are additive, it is common practice to use per-thread non-synchronized histograms or SingleWriterRecorders, and use a summary/reporting thread to perform histogram aggregation math across time and/or threads.

Iteration

Histograms support multiple convenient forms of iterating through the histogram data set, including linear, logarithmic, and percentile iteration mechanisms, as well as means for iterating through each recorded value or each possible value level. The iteration mechanisms are accessible through the HistogramData available through getHistogramData(). Iteration mechanisms all provide HistogramIterationValue data points along the histogram's iterated data set, and are available for the default (corrected) histogram data set via the following HistogramData methods:

  • percentiles: An Iterable<HistogramIterationValue> through the histogram using a PercentileIterator
  • linearBucketValues: An Iterable<HistogramIterationValue> through the histogram using a LinearIterator
  • logarithmicBucketValues: An Iterable<HistogramIterationValue> through the histogram using a LogarithmicIterator
  • recordedValues: An Iterable<HistogramIterationValue> through the histogram using a RecordedValuesIterator
  • allValues: An Iterable<HistogramIterationValue> through the histogram using a AllValuesIterator

Iteration is typically done with a for-each loop statement. E.g.:

 for (HistogramIterationValue v :
      histogram.getHistogramData().percentiles(ticksPerHalfDistance)) {
     ...
 }

or

 for (HistogramIterationValue v :
      histogram.getRawHistogramData().linearBucketValues(unitsPerBucket)) {
     ...
 }

The iterators associated with each iteration method are resettable, such that a caller that would like to avoid allocating a new iterator object for each iteration loop can re-use an iterator to repeatedly iterate through the histogram. This iterator re-use usually takes the form of a traditional for loop using the Iterator's hasNext() and next() methods.

So to avoid allocating a new iterator object for each iteration loop:

 PercentileIterator iter =
    histogram.getHistogramData().percentiles().iterator(ticksPerHalfDistance);
 ...
 iter.reset(percentileTicksPerHalfDistance);
 for (iter.hasNext() {
     HistogramIterationValue v = iter.next();
     ...
 }

Equivalent Values and value ranges

Due to the finite (and configurable) resolution of the histogram, multiple adjacent integer data values can be "equivalent". Two values are considered "equivalent" if samples recorded for both are always counted in a common total count due to the histogram's resolution level. HdrHistogram provides methods for determining the lowest and highest equivalent values for any given value, as well as determining whether two values are equivalent, and for finding the next non-equivalent value for a given value (useful when looping through values, in order to avoid a double-counting count).

Corrected vs. Raw value recording calls

In order to support a common use case needed when histogram values are used to track response time distribution, Histogram provides for the recording of corrected histogram value by supporting a recordValueWithExpectedInterval() variant is provided. This value recording form is useful in [common latency measurement] scenarios where response times may exceed the expected interval between issuing requests, leading to "dropped" response time measurements that would typically correlate with "bad" results.

When a value recorded in the histogram exceeds the expectedIntervalBetweenValueSamples parameter, recorded histogram data will reflect an appropriate number of additional values, linearly decreasing in steps of expectedIntervalBetweenValueSamples, down to the last value that would still be higher than expectedIntervalBetweenValueSamples.

To illustrate why this corrective behavior is critically needed in order to accurately represent value distribution when large value measurements may lead to missed samples, imagine a system for which response times samples are taken once every 10 msec to characterize response time distribution. The hypothetical system behaves "perfectly" for 100 seconds (10,000 recorded samples), with each sample showing a 1msec response time value. At each sample for 100 seconds (10,000 logged samples at 1 msec each). The hypothetical system then encounters a 100 sec pause during which only a single sample is recorded (with a 100 second value). The raw data histogram collected for such a hypothetical system (over the 200 second scenario above) would show ~99.99% of results at 1 msec or below, which is obviously "not right". The same histogram, corrected with the knowledge of an expectedIntervalBetweenValueSamples of 10msec will correctly represent the response time distribution. Only ~50% of results will be at 1 msec or below, with the remaining 50% coming from the auto-generated value records covering the missing increments spread between 10msec and 100 sec.

Data sets recorded with and without an expectedIntervalBetweenValueSamples parameter will differ only if at least one value recorded with the recordValue method was greater than its associated expectedIntervalBetweenValueSamples parameter. Data sets recorded with an expectedIntervalBetweenValueSamples parameter will be identical to ones recorded without it if all values recorded via the recordValue calls were smaller than their associated (and optional) expectedIntervalBetweenValueSamples parameters.

When used for response time characterization, the recording with the optional expectedIntervalBetweenValueSamples parameter will tend to produce data sets that would much more accurately reflect the response time distribution that a random, uncoordinated request would have experienced.

Footprint estimation

Due to its dynamic range representation, Histogram is relatively efficient in memory space requirements given the accuracy and dynamic range it covers. Still, it is useful to be able to estimate the memory footprint involved for a given highestTrackableValue and numberOfSignificantValueDigits combination. Beyond a relatively small fixed-size footprint used for internal fields and stats (which can be estimated as "fixed at well less than 1KB"), the bulk of a Histogram's storage is taken up by its data value recording counts array. The total footprint can be conservatively estimated by:

 largestValueWithSingleUnitResolution =
        2 * (10 ^ numberOfSignificantValueDigits);
 subBucketSize =
        roundedUpToNearestPowerOf2(largestValueWithSingleUnitResolution);

 expectedHistogramFootprintInBytes = 512 +
      ({primitive type size} / 2) *
      (log2RoundedUp((highestTrackableValue) / subBucketSize) + 2) *
      subBucketSize

A conservative (high) estimate of a Histogram's footprint in bytes is available via the getEstimatedFootprintInBytes() method.

hdrhistogram_rust's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hdrhistogram_rust's Issues

index_for uses isize, and may panic

It seems like we should be able to guarantee that those indexes are positive, which should hopefully also make us confident that it won't panic.

No way to append to an interval log?

Hey, great work on this library - enjoying working in it. I was wondering, was it intentional to design the api such that you can't append to interval logs? I had assumed that's what I'd be doing but realized eventually that would not work. Even if there was a stand-alone function or impl to serialize one Histogram into a Writer in the interval format, I think that would allow a lot of flexibility. Thanks!

Consider allowing serialization into a Vec<u8> directly

Currently, serialization output is directed to a Write in both the V2 and V2 + DEFLATE serializers. This is a very flexible abstraction, but the serialization formats dictate a length prefix, which means that output must be buffered before writing to the writer so that we can drop in the final length in the appropriate spot.

This buffering is especially sad in the DEFLATE case, since we already have a Vec<u8> that will hold the uncompressed serialization, but we can only expose it to the underlying V2 serializer as a Write, so the uncompressed bytes get written to V2's buffer, then copied into V2 + DEFLATE's buffer, then compressed into another buffer, then written to the user-provided writer.

Some options:

  • Functions for Write or Vec<u8>, where the former buffers into an internal Vec<u8>
  • Functions for Write or Write + Seek. Need to benchmark this to see if it's measurably different from the Vec<u8> case. If it isn't, this seems preferable as it is more general.
  • Give up on Write and only serialize into either Write + Seek or a Vec<u8>. Maybe supporting I/O streams that don't seek is a sufficiently niche case that it's not worth creating an easy path for? On one hand, I really want to encourage people to use these formats as a wire protocol in a monitoring system. On the other hand, maybe such a protocol would use a container format like protobuf around the serialized histogram anyway, and it's a waste of API complexity budget to support simple Write usage.

Merge multiple Histograms into 1.

We are going to use this in mqttwrk where in we launch a bunch of clients. We plan to have a Histogram instance for each connection to measure metrics. After that we would like to merge these histograms to get an aggregation of all histograms.

I was looking at Python docs and found something similar. which supports merge of multiple histograms. Our docs dont mention this explicitly. Do we support this feature?

If not, would you be open to PR where we serialise multiple Histograms and send them over a channel and deserialise each of them and merge. Like a fan in? Would take a bit time as I would have to familiarise myself with the code-base.

Criterion data in repo

Is it right that .criterion is checked in? It seems like it's using that as a baseline to compare against or something, which isn't applicable on my hardware. Also, presumably if you were benchmarking against a baseline, wouldn't you want to set the baseline on a particular revision, then apply the changes under consideration?

Add histogram log format support

Generally useful, but also when we develop a corpus of sample serialized histograms with pre-calculated metadata to test an implementation against, it would be good to do the same for the log format -- which means we need the log format first.

Express auto resize functionality as a type?

While working on #10 I'm ending up with error options that can only occur if resize is disabled. This is unfortunate as it means that someone using resize will now have to handle error variants that won't ever occur at runtime.

The way that auto resize interacts with the contracts of the methods is a little regrettable to begin with. I wonder if there's a way we could reflect this difference in the type system rather than with comments here and there describing what will or won't happen when auto resize is enabled. Maybe a trait with associated types for errors and implementations for both auto resize and plain histograms? Almost all the code could be re-used between the two, I think, with just some different error enums to give each style a very precise error mapping.

Error when iterating over histogram with only zeros

If a histogram recorded only zeros, then the iter_recorded returns an empty iterator.

Reproduction:

use hdrhistogram::Histogram;
fn main() {
    let mut h = Histogram::<u64>::new(1).unwrap();
    h += 0;
    // Uncomment and suddenly the iterator returns 2 elements
    // h += 1;
    println!("Histogram: {h:?}");
    println!("Zeros: {}", h.count_at(0));
    // I expect that to return one element - 0 with count 1, or two if uncommented above
    for d in h.iter_recorded() {
        println!("{d:?}");
    }
}

It looks like the problem is with the iterator, as count_at returns the right number.

Publish 7.0.0 to crates.io

Could you publish version 7.0.0 to crates.io? (I'm using the new Error types in 7.0.0.) Thanks for your work on this library!

Clarify tracking of highest and lowest trackable values

See #74 (comment) for more context.

We save in fields (and write when serializing) the requested limits, not the actual limits that result from the underlying encoding (which will encompass at least as much as what the user requested, and maybe more). Perhaps we should expose the actual limits of what a particular histogram can do, rather than just regurgitate the limits that the user requested? This would be useful when, say, storing metadata about histograms, since the data actually in the histogram is likely more interesting than the particular subset of values that were initially requested as trackable.

Strawman:

  • configured_low() for what the user requested when creating the histogram
  • actual_low() for what the histogram can support
  • configured_high(), actual_high()

u128 support

Should we move things like the total number of counts to u128? That may have a detrimental affect on performance, especially on lesser hardware; I haven't benchmarked it yet to see.

Strange behaviour of percentile iterator

I am trying to create hgrm text output. For this I am using this function: https://gist.github.com/algermissen/9f86fe6051a5e4f89ef92d8f5cfd9637 which aims to be a port of https://github.com/HdrHistogram/HdrHistogram_py/blob/5b9b94a1827a88c4782100dff52dfc7e2578ee87/hdrh/histogram.py#L576

The histogram is created in Rust code with let hist = Histogram::<u32>::new_with_bounds(1, 60 * 60 * 1000, 2), exported as V2+DEFLATE and turned to base64.

The hgrm outputs below are 1. Generated from the Rust histogram using the function in the gist and 2. created by pasting the base64 into https://hdrhistogram.github.io/HdrHistogramJSDemo/decoding-demo.html

What strikes me as odd (besides that they are not the same) is that the Rust version seems to be 'one-off' in some sense. It is as if the last iteration should not be there (the percentile 1.0 appears twice. In the gist I check if v.quantile() < 1.0 and the iteration value seems to indeed return 1.0 twice.

      951.00 1.000000000000         10
      951.00 1.000000000000         10

Is this an indicator of a bug or am I doing something very wrong here?

I also wonder why there is not better precision for the percentile column shown in the Rust output - is this my formatting or some unwanted rounding going on (note: I am still new to Rust, so might entirely be me, being simply stupid)

Rust-direct-output with Gist-Function:

       Value     Percentile TotalCount 1/(1-Percentile)

       79.00 0.100000000000          1           1.11
       79.00 0.100000000000          1           1.11
       85.00 0.300000000000          3           1.43
       89.00 0.400000000000          4           1.67
       89.00 0.400000000000          4           1.67
       98.00 0.500000000000          5           2.00
      115.00 0.600000000000          6           2.50
      133.00 0.700000000000          7           3.33
      133.00 0.700000000000          7           3.33
      212.00 0.800000000000          8           5.00
      212.00 0.800000000000          8           5.00
      212.00 0.800000000000          8           5.00
      373.00 0.900000000000          9          10.00
      373.00 0.900000000000          9          10.00
      373.00 0.900000000000          9          10.00
      373.00 0.900000000000          9          10.00
      373.00 0.900000000000          9          10.00
      951.00 1.000000000000         10
      951.00 1.000000000000         10
#[Mean    =       221.90, StdDeviation   =       257.55]
#[Max      =          951, TotalCount   =           10]
#[Buckets      =           15, SubBuckets   =         2048]

Converted from base64 using JS-demo page:

       Value     Percentile TotalCount 1/(1-Percentile)

       79.00 0.000000000000          1           1.00
       79.00 0.100000000000          1           1.11
       85.00 0.200000000000          3           1.25
       85.00 0.300000000000          3           1.43
       89.00 0.400000000000          4           1.67
       98.00 0.500000000000          5           2.00
      115.00 0.550000000000          6           2.22
      115.00 0.600000000000          6           2.50
      133.00 0.650000000000          7           2.86
      133.00 0.700000000000          7           3.33
      212.00 0.750000000000          8           4.00
      212.00 0.775000000000          8           4.44
      212.00 0.800000000000          8           5.00
      373.00 0.825000000000          9           5.71
      373.00 0.850000000000          9           6.67
      373.00 0.875000000000          9           8.00
      373.00 0.887500000000          9           8.89
      373.00 0.900000000000          9          10.00
      951.00 0.912500000000         10          11.43
      951.00 1.000000000000         10
#[Mean    =       221.90, StdDeviation   =       257.55]
#[Max     =       951.00, Total count    =           10]
#[Buckets =           15, SubBuckets     =          256]

Move to nom 5.0

We're currently on 4.2.3, whereas upstream has been bumped to 5.0. While I think the changes are mainly cosmetic/mechanical between the two, it still requires some work.

Deprecate non-deflate serialization

As far as I can tell, the V2 log format in the Java implementation does not support not compressing the histograms. Should we remove support for that? Should we at least update the examples to prefer the more "standard" serializer that will interoperate with other hdrhistogram libraries?

cc @marshallpierce

`fmt::Display` trait for errors

Hi and thanks for a great library!

As the topic states, could it be possible to implement Display or even Error trait(s) for the crate's error types? I believe it will improve user experience and generally make it easier to incorporate those errors into the end-user's error type system :)

Thanks!

Feature request: support floating-point numbers

Feature request: support floating-point numbers

See #91,

Floating point values can be supported, but they are basically a completely separate implementation that wraps an inner integer-based histogram. I'd be neat to port DoubleHistogram to Rust as well, but unfortunately it's a larger effort that I don't have time for myself at the moment. If you want to give it a try, I'd be happy to try and review? It should be a fairly straightforward translation of the Java version I linked above! I wonder if we may even be able to have a single Histogram type by providing inherent implementations for Histogram<f64>... Not entirely clear. Initially I'd just have a separate DoubleHistogram type.

If possible, I suggest implementing a type-generic histogram using num-traits.

Support deserializing into existing histogram

The current deserialization example is a little sad, in that it needs to allocate a new histogram for each of the intermediate deserialized histograms. It would be awesome if there was also a way to deserialize directly into an existing histogram (essentially deserialize + add without an intermediate collection). I don't know if this is doable given the serialization format though..

Audit `usize` usage for 16-bit safety

Inspired by the looming feasibility of Rust on AVR, in the serialization code I've started to use checked pointer math to make sure we don't overflow usize. There are other places where I'm pretty sure we assume that usize is at least 32 bits, so we should find those and add appropriate checking.

Make public deserialize methods `deser_v2` & `deser_v2_compressed`

Specifically I'm interested in calling the deser_v2 method directly. My motivation is I'm using hdrhistogram in a WASM module where I only need to deserialize uncompressed histograms. By using the deserialize method the compiler can not statically determine that I will not be deserializing compressed histograms and flate2 gets included in my WASM.

I only included deser_v2_compressed in the title because I figured if you make public one of them then perhaps it makes sense to do both(??).

Add method for clamped recording

In some benchmarking scenarios, I am running into extreme outliers, often due to a bug in the system or due to hitting a scaling wall. When these happen, my benchmark binary crashes, since I record using something like:

let us = elapsed.as_secs() * 1_000_000 + elapsed.subsec_nanos() as u64 / 1_000;
hist.record(us).unwrap();

Operations generally take ~10µs, but the outliers can easily grow to tens of seconds. So even with liberal bounds, the record will fail. When I hit these outliers, I don't actually care about how much of an outlier they are, since I don't expect any "regular" samples beyond, say, ~1ms. In this case, what I really want is the ability to record values clamped to the range of the histogram (which should then never fail). I end up writing code like:

if hist.record(us).is_err() {
   let m = hist.high();
    hist.record(m).unwrap();
}

It'd be really nice if I could instead write:

hist.record_clamped(us);

@marshallpierce thoughts?

Allow customizing handling of out-of-range counts when serializing

We allow u64 counts, but the serialization format only allows i64::max_value() as the largest count. So, you can end up with an un-serializable histogram.

It would be nice if the user could choose what to do in such a situation. Right now we always error, which means I needed to do things like https://github.com/jonhoo/hdrsample/pull/40/files#diff-c761d465047faa0efb455f492b2f1e07R528 when testing serialization with random counts.

Options that come to mind:

  • Error (we've got this one now)
  • Squish oversized counts down to i64::max_value()
  • ???

SyncHistogram::refresh() freezes

hdrhistogram = { version = "7.5.2", default-features = false, features = ["sync"] }

I'm getting recorders within a tokio::spawn closure:

time_per_one_recorder
    .get_or(|| {
        let time_per_one = time_per_one.lock().expect("time_per_one lock on updater");
        std::cell::RefCell::from(time_per_one.recorder())
    })
    .borrow_mut()
    .record(duration_ms)
    .expect("record to histogram");

When all tasks (1 million of them, i.e. that many record() calls) were processed and there's nothing else to record, I'm doing a refresh:

let mut time_per_one = time_per_one.lock().unwrap();
println!("=> Merging histograms...");
time_per_one.refresh();
println!("=> Merging done");

My program hangs forever on the refresh(), because the merging message is the last thing I'm seeing on stdout. According to top, my process doesn't do anything. I never tried debugging (as in gdb) for Rust programs so can't tell more at this point, will try it later.

By the way, everything used to be working fine whenever I called refresh() roughly once a second during the task processing. The freeze started happening when I decided one big merge when everything's done is better because it stops blocking recorders amid task processing.

Is this a crate bug, or maybe I'm using the crate wrong?

Thanks.

Test failures on Debian i386

I recently updated the hdrhistogram package in Debian, and as part of that I resolved the issues that were blocking the tests from running on Debian's test infrastructure.

Unfortunately when running the CI tests on i386 two tests failed, I can also reproduce this locally.

failures:

---- iter_quantiles_saturated_count_before_max_value stdout ----
thread 'iter_quantiles_saturated_count_before_max_value' panicked at 'capacity overflow', library/alloc/src/raw_vec.rs:518:5
stack backtrace:
   0: rust_begin_unwind
             at /usr/src/rustc-1.59.0/library/std/src/panicking.rs:498:5
   1: core::panicking::panic_fmt
             at /usr/src/rustc-1.59.0/library/core/src/panicking.rs:116:14
   2: core::panicking::panic
             at /usr/src/rustc-1.59.0/library/core/src/panicking.rs:48:5
   3: alloc::raw_vec::capacity_overflow
             at /usr/src/rustc-1.59.0/library/alloc/src/raw_vec.rs:518:5
   4: alloc::raw_vec::handle_reserve
             at /usr/src/rustc-1.59.0/library/alloc/src/raw_vec.rs:489:34
   5: alloc::raw_vec::RawVec<T,A>::reserve::do_reserve_and_handle
             at /usr/src/rustc-1.59.0/library/alloc/src/raw_vec.rs:287:13
   6: alloc::raw_vec::RawVec<T,A>::reserve
             at /usr/src/rustc-1.59.0/library/alloc/src/raw_vec.rs:291:13
   7: alloc::vec::Vec<T,A>::reserve
             at /usr/src/rustc-1.59.0/library/alloc/src/vec/mod.rs:809:9
   8: alloc::vec::Vec<T,A>::extend_desugared
             at /usr/src/rustc-1.59.0/library/alloc/src/vec/mod.rs:2642:17
   9: <alloc::vec::Vec<T,A> as alloc::vec::spec_extend::SpecExtend<T,I>>::spec_extend
             at /usr/src/rustc-1.59.0/library/alloc/src/vec/spec_extend.rs:18:9
  10: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
             at /usr/src/rustc-1.59.0/library/alloc/src/vec/spec_from_iter_nested.rs:37:9
  11: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
             at /usr/src/rustc-1.59.0/library/alloc/src/vec/spec_from_iter.rs:33:9
  12: <alloc::vec::Vec<T> as core::iter::traits::collect::FromIterator<T>>::from_iter
             at /usr/src/rustc-1.59.0/library/alloc/src/vec/mod.rs:2541:9
  13: core::iter::traits::iterator::Iterator::collect
             at /usr/src/rustc-1.59.0/library/core/src/iter/traits/iterator.rs:1745:9
  14: iterators::iter_quantiles_saturated_count_before_max_value
             at ./tests/iterators.rs:569:55
  15: iterators::iter_quantiles_saturated_count_before_max_value::{{closure}}
             at ./tests/iterators.rs:561:1
  16: core::ops::function::FnOnce::call_once
             at /usr/src/rustc-1.59.0/library/core/src/ops/function.rs:227:5
  17: core::ops::function::FnOnce::call_once
             at /usr/src/rustc-1.59.0/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

---- iter_quantiles_iterates_to_quantile_10_as_it_reaches_last_bucket stdout ----
thread 'iter_quantiles_iterates_to_quantile_10_as_it_reaches_last_bucket' panicked at 'capacity overflow', library/alloc/src/raw_vec.rs:518:5
stack backtrace:
   0: rust_begin_unwind
             at /usr/src/rustc-1.59.0/library/std/src/panicking.rs:498:5
   1: core::panicking::panic_fmt
             at /usr/src/rustc-1.59.0/library/core/src/panicking.rs:116:14
   2: core::panicking::panic
             at /usr/src/rustc-1.59.0/library/core/src/panicking.rs:48:5
   3: alloc::raw_vec::capacity_overflow
             at /usr/src/rustc-1.59.0/library/alloc/src/raw_vec.rs:518:5
   4: alloc::raw_vec::handle_reserve
             at /usr/src/rustc-1.59.0/library/alloc/src/raw_vec.rs:489:34
   5: alloc::raw_vec::RawVec<T,A>::reserve::do_reserve_and_handle
             at /usr/src/rustc-1.59.0/library/alloc/src/raw_vec.rs:287:13
   6: alloc::raw_vec::RawVec<T,A>::reserve
             at /usr/src/rustc-1.59.0/library/alloc/src/raw_vec.rs:291:13
   7: alloc::vec::Vec<T,A>::reserve
             at /usr/src/rustc-1.59.0/library/alloc/src/vec/mod.rs:809:9
   8: alloc::vec::Vec<T,A>::extend_desugared
             at /usr/src/rustc-1.59.0/library/alloc/src/vec/mod.rs:2642:17
   9: <alloc::vec::Vec<T,A> as alloc::vec::spec_extend::SpecExtend<T,I>>::spec_extend
             at /usr/src/rustc-1.59.0/library/alloc/src/vec/spec_extend.rs:18:9
  10: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
             at /usr/src/rustc-1.59.0/library/alloc/src/vec/spec_from_iter_nested.rs:37:9
  11: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
             at /usr/src/rustc-1.59.0/library/alloc/src/vec/spec_from_iter.rs:33:9
  12: <alloc::vec::Vec<T> as core::iter::traits::collect::FromIterator<T>>::from_iter
             at /usr/src/rustc-1.59.0/library/alloc/src/vec/mod.rs:2541:9
  13: core::iter::traits::iterator::Iterator::collect
             at /usr/src/rustc-1.59.0/library/core/src/iter/traits/iterator.rs:1745:9
  14: iterators::iter_quantiles_iterates_to_quantile_10_as_it_reaches_last_bucket
             at ./tests/iterators.rs:717:55
  15: iterators::iter_quantiles_iterates_to_quantile_10_as_it_reaches_last_bucket::{{closure}}
             at ./tests/iterators.rs:703:1
  16: core::ops::function::FnOnce::call_once
             at /usr/src/rustc-1.59.0/library/core/src/ops/function.rs:227:5
  17: core::ops::function::FnOnce::call_once
             at /usr/src/rustc-1.59.0/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.


failures:
    iter_quantiles_iterates_to_quantile_10_as_it_reaches_last_bucket
    iter_quantiles_saturated_count_before_max_value

test result: FAILED. 18 passed; 2 failed; 0 ignored; 0 measured; 0 filtered out; finished in 39.15s

Debian i386 uses the x87 FPU and I strongly suspect it's quirky floating point is involved in these failures, but I have no idea how to debug this further.

Pure rust compression

flate2 is fast, but it would be nice to have a default pure rust compression library so we wouldn't need to have the serialization feature stuff.

Presumably this would go along with allowing the user to specify which compression option to use, perhaps via generifying on the compressor.

f32 or i32 support

First, thanks for your contribution. Your crate is really well documented and easy to use. However, I am not able to use it because of supporting only u64.

It would be nice if there is a support for negative numbers (i32) and floating point number (f32). For f32, I can just multiply with scaling factors to use as i32 but supporting negative number is a really needed feature for me.

[Breaking change] rename `more()` to something more explicit.

See #124 and #126, the name more(), is confusing as it indicate that we want to keep iterating over buckets/records whatever even if they all will be empty. Maybe rename to something like more_with_zero_count, or more_empty ? Though this does not convey that all the subsequent one will be empty.

Apply rustfmt everywhere

The default rustfmt config is maybe a little "fluffy" for my taste in its application of newlines but I really don't care that much. Maybe not even enough to make a rustfmt.toml. ;)

Restrict counts to only u8-u64

We don't seem to be gaining much by allowing, say, f64, and it leads to some questionable things like:

total_count = total_count + count.to_u64().unwrap();

Meaningfully handling the case where that doesn't work seems iffy.

iter_recorded() method fails to iterate if all the values are zero.

iter_recorded() won't work when all the values recorded by the Histogram instance is 0.
Sample code to hit the issue:

let mut tracker = Histogram::<u64>::new(3).unwrap();
tracker += 0;
tracker += 0;
tracker += 0;

 // print 3 as expected.
 println!("{}", tracker.len());

 let mut res = String::new();
 for v in tracker.iter_recorded() {
     res.push_str(format!("value {} and count {}", v.value_iterated_to(), v.count_at_value()).as_str());
 }

 // nothing is printed. Expected: value 0 and count 3
 println!("{}", res);

We use the iterator to print out a certain format and emit to our log file. Is there a better way to do that for the logging purpose?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.