56quarters / cadence Goto Github PK

View Code? Open in Web Editor NEW

82.0 82.0 26.0 720 KB

An extensible Statsd client for Rust

Home Page: https://docs.rs/cadence/

License: Apache License 2.0

Rust 100.00%

metrics rust statsd udp

cadence's People

Contributors

Stargazers

Watchers

cadence's Issues

Replace UnixStream based sink with UnixDatagram based one

Per https://docs.datadoghq.com/developers/dogstatsd/unix_socket/#using-socat-as-a-proxy and https://github.com/DataDog/datadog-agent/wiki/Unix-Domain-Sockets-support it seems like the UnixStream based sink implemented in #86 should have been UnixDatagram based.

To that end, the existing sink should be fixed to use datagrams. The docs referenced above also have some guidance for error handing. This should be incorporated into the sink if it's a fit for the existing behavior of Cadence.

Automatically add tags to all metrics

It would be useful to be able to create a new "view" on a StatsdClient which automatically adds tags to all metrics reported through that client.

If I have multiple workers within an application, or multiple instances of that application running at the same time, the "metric reporting code" currently needs to know the application/worker IDs to add the correct tags.

Consider adding #[must_use] to MetricBuilder

I just updated my code to make use of tags (thanks for adding it, works great!). Now I have this code for example:

client.mark_with_tags("metric")
    .with_tag("key", "value")
    .send()
    .ok();

One thing that came up in the review was that it's easy to forget to call .send() on the MetricBuilder. And if you do, there's no warning.

So I was wondering, have you considered annotating MetricBuilder with #[must_use]? I gave it a quick try locally and it seems to work nicely, it results in a warning like this:

warning: unused cadence::builder::MetricBuilder which must be used

It prints that warning if you just have a mark_with_tags(...) and even when you have mark_with_tags(...).with_tag(...) (but forgot send).

Fix docs to reflect `from_udp_host` deprecation

The docs (lib.rs + README.md) still reference the method. They should be updated to use the from_sink constructor for the client.

Create example executables

Turns out Cargo can run executables in the examples/ directory. We should definitely have some examples. Question is, should these be more production-y than the code in the docs? Is it OK to still have code with anti-patterns like .unwrap() etc?

Implement Clone wherever possible

There's a decent chance that the async MetricSink from #23 won't be shareable between threads (since ThreadPool makes use of Sender and most solutions would use Sender as well). Because of this, we need to make sure that the client can be .clone()'d to be sent between threads.

Create sentinel for thread in QueuingMetricSink

Currently, if the wrapped sink panics, the thread will die and no more metrics will be emitted. This would be pretty bad if it happened. Let's get a sentinel to restart the worker thread if it panics before releasing any functionality with this sink.

MultiLineWriter len comparison doesn't include line ending

When deciding if the buffered writer should be bypassed or not, the comparison doesn't include the length of the line ending - only the buffer to write. This means that in some situations the buffer gets flushed twice for a single .write() call.

Additionally, the call to .flush() is only required when we are using the buffer - not when bypassing it.

https://github.com/tshlabs/cadence/blob/master/src/io.rs#L64

Docs around non blocking sockets

Some users of the library might want to use the UDP sink in with a non blocking socket.

Add docs to UdpMetricSink to mention how to do this
Add docs to StatsdClient to explain using a non blocking socket (and how from_udp_host is blocking)
Add integration tests that use a socket in non-blocking mode

Deprecate LogMetricSink

It adds an extra dependency and would be trivial for users to just write their own implementation. Deprecate it in 0.9 and remove in 0.10

Set up code coverage tool

Either local or hosted, doesn't really matter.

Consider adding `send`-like method to `MetricBuilder` that ignores results

Per #63, an attribute was added to the MetricBuilder to warn when it was unused. Thus, code like the following (incorrect, since the metric isn't sent) would generate a warning:

client.count_with_tags("some.key", 4)
    .with_tag("region", "us-east-1")
    .with_tag("host", "container-12345-abc");

However, the following (totally reasonable) code also generates a warning:

client.count_with_tags("some.key", 4)
    .with_tag("region", "us-east-1")
    .with_tag("host", "container-12345-abc")
    .send();

To avoid the warning, users need to do something with the Result from .send(), like...

client.count_with_tags("some.key", 4)
    .with_tag("region", "us-east-1")
    .with_tag("host", "container-12345-abc")
    .send()
    .ok();

To solve this, @robinst has suggested adding another method to MetricBuilder that sends the metric but does not return a result so that callers aren't forced to do something perfunctory with it.

Example (using the name send_quiet which I'm not at all attached to):

client.count_with_tags("some.key", 4)
    .with_tag("region", "us-east-1")
    .with_tag("host", "container-12345-abc")
    .send_quiet();

What are people's thoughts on this? Does adding another send-like method seem acceptable? If so, any thoughts on a name?

/cc @robinst @pjenvey

Deprecate ConsoleMetricSink

It's about two lines of code and clutters the global namespace.

Flush metrics with BufferedUdpMetricSink

In environments where metrics are constantly emitted, there is no need to flush the buffer. But in situations where the environment is only sparsely used, metrics can sit in the buffer for arbitrarily long durations.

A similar (but potentially more pressing) issue is that BufferedUdpMetricSink has no way to flush the contents of its buffer when dropped, making data loss inevitable.

Add default implementations for some client trait methods

Going through the process of creating a MetricClient implementation for #67 and it'd be nice to not have to implement every single method. Some of them should have default implementations that look pretty much like their real implementations for StatsdClient look.

Example for Counted

pub trait Counted {
    fn incr(&self, key: &str) -> MetricResult<Counter> {
        self.count(key, 1)
    }

    fn incr_with_tags<'a>(&'a self, key: &'a str) -> MetricBuilder<Counter> {
        self.count_with_tags(key, 1)
    }

    // etc.

    fn count(&self, key: &str, count: i64) -> MetricResult<Counter>;

    fn count_with_tags<'a>(&'a self, key: &'a str, count: i64) -> MetricBuilder<Counter>;
}

Remove uneeded Arc wrappers

Get rid of useless Arc wrappers in all the various sinks once the Clone trait isn't required to be used with AsyncMetricSink (because we're removing it, probably -- #34).

Improve docs around MetricsSink trait

Include better docs that specify how metrics are actually formatted when passed to the emit method.

Expose APIs for extensions via other libraries

Per #70, it would be helpful to expose parts of the Cadence API (that are currently internal to the lib) so that other libraries may add extra functionality. For example, in order to implement a full Datadog client, an external library would need access to the StatsdClient::send_metric and StatsdClient::consume_error methods in order to emit events.

/cc @goyox86 @berkus

Investigate what it would take to use Futures (or Tokio) idiomatically

Lots of Rust servers / apps are being created based on Tokio which uses Future (and the associated lib) as one its key abstractions. What would it take to expose something like the MetricClient API in a way that either uses Futures or the Tokio event loop?

Change UdpMetricSink to use UdpSocket::connect method?

Potentially have the sink call UdpSocket::connect to set the address that will be written too.

Pros:

We aren't doing the address parsing ourself
Our code becomes cleaner
Remove address from the struct

Cons:

We're modifying the socket we're passed (maybe this isn't an issue since we take ownership of it)
Cadence now won't compile with anything lower than Rust 1.9 (maybe also not an issue since we are planning on calling .set_nonblocking by default per #14 which is also Rust 1.9+)

Deprecate AsyncMetricSink

Seems like the QueuingMetricSink will have better performance and ergonomics. After #30 is proved out, this can be removed.

StatsdClient should implement `RefUnwindSafe`

All Sync types should also implement RefUnwindSafe as the former implies the latter.

The StatsdClient currently does not implement RefUnwindSafe because it stores two trait objects:

    sink: Arc<MetricSink + Sync + Send>,
    errors: Arc<Fn(MetricError) -> () + Sync + Send>,

The RefUnwindSafe bound should be added to both of these:

    sink: Arc<MetricSink + Sync + Send + RefUnwindSafe>,
    errors: Arc<Fn(MetricError) -> () + Sync + Send + RefUnwindSafe>,

And then the client will automatically become RefUnwindSafe itself.

Expand test coverage of cadence::types module

The MetricError struct + functionality has no tests at all right now. Implement some.

Updates for 2018 edition

Once we've updated deprecated method calls and established a minimum version in #80, start building with the 2018 edition (and update things that need to be updated).

Example in prelude should be 'no_run'

The code as written will try to create a UDP socket and to a DNS lookup. Code in docs probably shouldn't do that.

Back pressure for `QueuingMetricSink`

It currently uses an unbounded channel to talk to the worker thread. This was done to preserve behavior when switching from an older version of crossbeam to the newer crossbeam_channel. However, it's not a great idea to keep metrics in memory indefinitely (though this should never happen in practice since the worker thread will restart itself on panic).

Add a constructor for QueuingMetricSink that takes a size for the channel. When the channel is full, writes to the sink should fail and return an Err response (with appropriate metadata indicating the problem). Need to create some benchmarks and tests to determine if this could be used by default (the from() constructor).

Making this the default (if at all) may need to spread over two releases. TBD

Add sampling support

The client doesn't currently support sampling. Sampling allows callers to submit a metric every Nth time instead of every time (potentially for performance reasons). Cadence doesn't support this currently because adding a floating points to metric structs broke Eq (I think, can't quite remember). Let's fix this.

Support for Datadog-style Tags

Per #40, it would be nice to offer support for adding tags to metrics. As part of this, perhaps adding some sort of serialization or formatting layer between StatsdClient and the MetricSink implementations is in order. When / if this is implemented, let's do it in a way that doesn't exclude us from adding support to the Metrics 2.0 spec that's floating around.

References:
http://docs.datadoghq.com/guides/dogstatsd/#tags
http://metrics20.org/spec/#wire_format
https://docs.influxdata.com/influxdb/v1.1/write_protocols/line_protocol_tutorial/

/cc @robinst

Replace use of crossbeam `MsQueue`

We're quite a few versions behind in crossbeam releases and it looks like MsQueue has been removed in favor of SegQueue: https://stjepang.github.io/2019/01/29/lock-free-rust-crossbeam-in-2019.html#queues-revamped

Since we depend on the blocking pop operation for QueuingMetricSink, we'll likely have to switch to using crossbeam channels for this sink.

Convert license to Apache + MIT

Allow people to use Cadence under the Apache or MIT license. This is what many other Rust projects do and what Rust itself does.

Make StatsdClient more convenient to use

I'm using this crate to add metrics to a small server application that uses the Iron framework.

For a request handler, I have a struct that implements Iron's Handler trait (note that it needs to be Send + Sync because it's used for handling requests in different threads).

In the handler struct, I keep things that are needed when handling the requests. One of these things is a slog Logger. The nice thing about it is that I can construct it when starting up the application and clone() it when I put it in the struct. And because it's Send + Sync, it works nicely.

For metrics, I started out with putting a StatsdClient<QueuingMetricSink> into the handler struct.
But then ideally, for testing, I'd just want to use a StatsdClient<NopMetricSink> instead.

So what I ended up with in the end is a struct like this, with some methods that delegate to the client:

#[derive(Clone)]
pub struct Metrics {
    client: Arc<MetricClient + Send + Sync>,
}

Using that, I can easily construct a client that uses NopMetricSink if I want to disable metrics.

So I'm wondering if it would be generally more useful if StatsdClient itself didn't come with generics but instead wrapped its sink in an Arc (see the implementation of Logger).

That would also make it nicer when code needs to pass a StatsdClient around as an argument/field (which happens in my code base and I'd expect in most real-world uses of this).

What do you think?

Create trait that wraps all supported traits

Create a single trait that extends from Counted, Timed, Gauged, and Metered to allow users to put a client instance behind this trait in a Box. This new MetricClient trait could be used in many places where we currently recommend using each individual one.

Targeting 0.8.0 to limit each release to one major-ish change.

MultiLineWriter handles large writes incorrectly

Not sure how much of an issue this is in practice.

Currently, if callers of MultiLineWriter try to write a value larger than the underlying buffer, it writes directly the the underlying non-buffered Write implementation. This is always a wrapper around a UdpSocket in Cadence. However, it does two writes - one for the input, one for the line ending. When using a UdpSocket, this will end up sending two packets (one of which only contains a \n which is not useful).

The correct behavior here is to either:

Only write the input, not the newline because if we're just writing directly to the UdpSocket - there no value in the newline.
Concat the input and newline and write them in a single call to .write() at the cost of an extra allocation via the .concat() method.

Add optional support for Datadog extensions?

Datadog is a popular implementation of statsd, and it adds some useful extensions on top of the statsd protocol.

One of them is histograms. As can be seen here the type for a histogram is h.

It looks like this library is extensible enough to allow adding these myself by implementing Metric (if I read the sources correctly).

Having said that, I wanted to see if you already thought about adding them to the library or what your thoughts are about the Datadog extensions.

Attempt to reduce the number of allocations when writing a metric

Right now we're doing two String allocations for every metric that gets emitted.

One when creating the metric instance to combine the prefix and key

format!("{}.{}", prefix, key);

And one when generating the canonical representation of the metric.

format!("{}:{}|c", key, val);

It should be possible to reduce this to only a single allocation by passing the prefix, key, and value to the metric struct when instantiating it. This however would mean that we need to allocate the string representation (prefix.key:val|c) of the metric immediately when creating it. This may be an acceptable trade off to avoid the extra heap allocation (since when we create the metric, we always want the canonical string representation eventually).

Use benchmarks to confirm this is faster or at least even and not any worse to work with.

Convert uses of try! to use question mark operator

The ? operator is stable now so let's convert uses of try! to it

Replace deprecated stdlib calls and establish minimum Rust version

There are a number of warnings emitted now (1.33.0-nightly) about use of things in the standard library that are deprecated.

Instead of just replacing them, establish the minimum version of Rust required to use Cadence and document it. Perhaps establish a policy of supporting $current - 2 releases or something. Only once the minimum version is documented can we move on to replacing the deprecated methods.

Optional threadpool dependency isn't really optional

It's marked as optional in Cargo.toml but nothing will compile without the feature enabled. We should conditionally build the AsyncMetricSink based on the feature flag. Or just remove the feature flag and make the threadpool a hard dependency.

Remove `AsyncMetricSink`

Planning on deprecating in 0.10.0. This is for cleanup in 0.11.0.

Make use of thread pool for sending metrics async

Use the Rust thread pool library for writing metrics to the buffer asynchronously. The thread pool size would be user controlled and passed to the sink that makes use of it. Each call to MetricSink.emit would submit a task to the thread pool.

Questions:

Should this be a new sink or should this be part of the BufferedUdpMetricSink from #18?
Should there be some background thread that flushes the buffer no matter what every N milliseconds?

Change from_udp_host to create UDP socket in non-blocking mode by default

Like it says, if we create a socket in StatsdClient::from_udp_host, create it in non-blocking mode. We could also potentially support some sort of config object here to pass in. But, I don't think that's really needed since callers can always just use the StatsdClient::from_sink method for more control.

Add `time` method taking `Duration` in `Timed` trait?

Using time to record a timing, I've found myself having to write this to convert a Duration to milliseconds like this:

let start = Instant::now();
...
let duration = start.elapsed();
let ms = (duration.as_secs() * 1_000) + (duration.subsec_nanos() / 1_000_000) as u64;

Note that this can overflow. I decided to not care about this because this is about metrics (and if your timing overflows, your app has a bigger problem than incorrect metrics).

I wonder if you've considered adding an additional method to Timed that takes a Duration instead of a u64 and handles the conversion? It could return Err in case of overflow.

Cut and publish 0.11.0 release

Reminder to myself to create the 0.11.0 release.

Remove `LoggingMetricSink` and `ConsoleMetricSink`

They were deprecated in 0.9.0.

Investigate using Crossbeam lock-free queue + threads

There seems to be a fair amount of locking going on in the threadpool we're using. This, combined with how short the actual tasks to run are mean using a thread pool doesn't actually seem to make anything faster.

Let's investigate using the lock-free queue in Crossbeam along with some sort of custom thread worker. It's probably sufficient to only run a single thread worker as well.

http://aturon.github.io/crossbeam-doc/crossbeam/sync/struct.MsQueue.html

Add `prelude` module

Add a module for easy glob importing to make things easier for callers.

Example:

use cadence::prelude::*;
use cadence::Whatever;

Expand README and lib.rs

Expand the docs to include a blurb about what Statsd is, how it works, why you'd want to use it, and how Cadence fits in.

Investigate buffered MetricSink implementation

Per Statsd docs, it's possible to batch up metrics before sending. See if we can introduce some primitive in Cadence that makes this easy to do. Maybe a buffered sink that wraps another sink?

Convert MetricBuilder to trait and make use of 'impl Trait'

Once the Rust feature impl Trait has stabilized, we should start using it.

In particular, MetricBuilder should become a trait, with the current implementation becoming DefaultMetricBuilder or DatadogMetricBuilder or something.

As it currently stands, each trait we use for sending metrics (Counted, Timed, etc.) return the concrete type MetricBuilder but this struct can't be instantiated by any 3rd party code making alternate implementations of them impossible. This isn't the worst thing but it's not ideal.

Performance impact: 30% reduction

Thank you for implementing DataDog compliant statsd client. When I enabled metrics, I see a reduction in the performance by ~30%.

I am using ./dogstatsd-local -host 127.0.0.1 -port 8125 &> /dev/null as a server agent.

Here is code to create a client:

    let socket = match UdpSocket::bind("0.0.0.0:0") {
                Ok(s) => { s },
                Err(e) => {
                    error!("UdpSocket bind failed. Error:{:?}", e);
                    return Err(e.to_string());
                }
            };
            if let Err(e) = socket.set_nonblocking(true) {
                return Err(e.to_string());
            }

            let host = (conf.host.as_str(), conf.port);
            let udp_sink = match BufferedUdpMetricSink:: with_capacity(host, socket, 1024*1024) {
                Ok(s) => { s },
                Err(e) => {
                    return Err(e.to_string());
                }
            };
            let queuing_sink = QueuingMetricSink::from(udp_sink);
            let client = StatsdClient::from_sink(&conf.prefix, queuing_sink);

#[inline]
    pub fn incr<'a>(&'a self, name: &'a str)  {
        if self.conf.enabled {
            debug!("metrics:incr: {}", name);
            let _ = self.get_client().incr_with_tags(name).try_send();
        }
    }
    #[inline]
    pub fn decr<'a>(&'a self, name: &'a str)  {
        if self.conf.enabled {
            debug!("metrics:incr: {}", name);
            let _ = self.get_client().decr_with_tags(name).try_send();
        }
    }

 #[inline]
    pub fn timed_fn_db<'a, F, R>(&'a self, table_name: &'a [u8], operation: &'a str, callable: F) -> R
        where F: FnOnce() -> R
    {
        if self.conf.enabled {
            let start = time::Instant::now();
            let return_val = callable();
            let used = start.elapsed();
            let metric = format!("{}.{}", String::from_utf8_lossy(&table_name),operation);
            debug!("metrics:timed_fn_db: {}, Time:{:?}", metric, used);
            let _ = self.get_client().time_duration_with_tags(&metric, used).try_send();
            return_val
        }else {
            callable()
        }
    }

Code is compiled in the release mode:

[profile.release]
opt-level = 3
lto = true

Am I missing anything?

56quarters / cadence Goto Github PK

cadence's People

Contributors

Stargazers

Watchers

Forkers

cadence's Issues

Recommend Projects

Recommend Topics

Recommend Org