deadtrickster / prometheus.ex Goto Github PK

View Code? Open in Web Editor NEW

405.0 9.0 69.0 206 KB

Prometheus.io Elixir client

Elixir 98.18% Clean 0.01% Emacs Lisp 0.13% Shell 1.69%

prometheus elixir metrics instrumentation monitoring

prometheus.ex's Introduction

Prometheus.ex

Elixir Prometheus.io client based on Prometheus.erl.

Starting from v3.0.0 works with Elixir >=1.6 and Erlang >=20. For older versions, please use older tags.

Dashboard from Monitoring Elixir apps in 2016: Prometheus and Grafana by @skosch.

IRC: #elixir-lang on Freenode;
Slack: #prometheus channel - Browser or App(slack://elixir-lang.slack.com/messages/prometheus).

Example

defmodule ExampleInstrumenter do
  use Prometheus.Metric

  def setup do    
    Histogram.new([name: :http_request_duration_milliseconds,
                   labels: [:method],
                   buckets: [100, 300, 500, 750, 1000],
                   help: "Http Request execution time"])
  end

  def instrument(%{time: time, method: method}) do
    Histogram.observe([name: :http_request_duration_milliseconds, labels: [method]], time)
  end
end

defmodule ExampleInstrumenter do
  use Prometheus.Metric

  @histogram [name: :http_request_duration_milliseconds,
              labels: [:method],
              buckets: [100, 300, 500, 750, 1000],
              help: "Http Request execution time"]

  def instrument(%{time: time, method: method}) do
    Histogram.observe([name: :http_request_duration_milliseconds, labels: [method]], time)
  end
end

Here histogram will be declared in auto-generated @on_load callback, i.e. you don't have to call setup manually.

Please read how to measure durations correctly with prometheus.ex.

Integrations / Collectors / Instrumenters

Dashboards

Beam Dashboards.

Installation

Available in Hex, the package can be installed as:

Add prometheus_ex to your list of dependencies in mix.exs:
```
def deps do
  [{:prometheus_ex, "~> 3.1"}]
end
```

Ensure prometheus_ex is started before your application:

def application do
  [applications: [:prometheus_ex]]
end

prometheus.ex's People

Contributors

Stargazers

Watchers

Forkers

iamjarvo parroty vernomcrp maxdrift cybrox xfumihiro aweiker jarimatti seancribbs mallozup jfornoff progsmile jlgeering skosch icedragon200 feld silviurosu cjimison joladev kianmeng dylan-chong zeha svserge falood tellerhq sorliem ditointernet lanodan sabudaye duffelhq waltercompanies hazy ivanborisenk0 rozanov61 virkonomy bitobject trufan-poe rsignavong zraul123 athonet-open riverfinancial begateway sketch-hq replaygaming jobteaser anshul8951 ryochin astrokooxl xlgames-q5 iurimadeira team-telnyx wttj bluzky sumup shorelinesoftware hasan-tayyar-besik hasantayyar

prometheus.ex's Issues

Library Architecture Patterns

I am currently using this library and I quite enjoy it's use. I think that for our needs though a few interesting concepts may need to be addressed.

In Prometheus Client libraries, it seems that they provide 2 main bits of functionality regardless:

the ability to add metrics by simply importing the lib
the ability to start the webserver to export the metrics (they've moved towards text based only now)

This library requires the plug exporter library and I wonder if that should simply be included in this library?

Other concerns I have are around which application should "setup" the Metrics. I see that if I wish to create an instrumenter, I need to call the setup in my application boot loader. This seems like it should be handled by the prometheus client library.

I would love to hear your thoughts on this.

Quantile summary support

I see that in the sibling Erlang project that feature is already implemented, do you have an idea when it is available in this Elixir project (if it is planned to be available here)?

https://github.com/deadtrickster/prometheus.erl/blob/master/src/metrics/prometheus_quantile_summary.erl

How to approach debugging a dead metric pull request?

We use prometheus and prometheus-plugs for pulling metrics out of our elixir services, usually the amount of
data pulled is around 80-100 MB which we think it's too much.

We set a hard timeout of 10 mins for all our endpoints requests and on a daily basis we get failed prometheus pull requests due to timeouts (1 request exceeds 10 mins), we'd like to look more into debugging what's happening inside, whether it's an issue with re-formatting the payload or with its size. Is there a debugging mode that we can enable for better insights?

prometheus_ex on server with node_exporter - best practice

Just a question, … let say I have a server, with prometheus (node_exporter) already installed and operational. Without having elixir on this server. Now, I would like to deploy to this server a Elixir app, in which I have and would like to use Prometheus_ex package, to be able to get some elixir/erlang VM specifics. How is this done? what would be best practice? Keeping both node_exporter and elixirs’ prometheus_ex and both exporting under /metrics but on different ports? or what?

Compilation error with Elixir 1.14.0

== Compilation error in file lib/prometheus/buckets.ex ==
** (UndefinedFunctionError) function Kernel.Utils.defdelegate/2 is undefined or private. Did you mean:

      * defdelegate_all/3
      * defdelegate_each/2

    (elixir 1.14.0) Kernel.Utils.defdelegate({:new, [line: 18], [{:arg, [line: 18], nil}]}, [])
    lib/prometheus/buckets.ex:18: (module)

Prometheus.InvalidMetricArityError after phoenix Update

I updated phoenix to Version 1.6.
After that i get the following error in the log:

[error] Handler "telemetry_web__event_handler" has failed and has been detached. Class=:error
Reason=%Prometheus.InvalidMetricArityError{expected: 2, present: 3}
Stacktrace=[
  {:prometheus_metric, :check_mf_exists, 4,
   [file: 'src/prometheus_metric.erl', line: 149]},
  {:prometheus_histogram, :insert_placeholders, 3,
   [file: 'src/metrics/prometheus_histogram.erl', line: 443]},
  {:prometheus_histogram, :insert_metric, 5,
   [file: 'src/metrics/prometheus_histogram.erl', line: 431]},
  {:prometheus_histogram, :observe, 4,
   [file: 'src/metrics/prometheus_histogram.erl', line: 203]},
  {Prometheus.Metric.Histogram, :observe, 2,
   [file: 'lib/prometheus/metric/histogram.ex', line: 101]},
  {:telemetry, :"-execute/3-fun-0-", 4,
   [
     file: '/Users/david/projects/private/BetterTyping/deps/telemetry/src/telemetry.erl',
     line: 150
   ]},
  {:lists, :foreach, 2, [file: 'lists.erl', line: 1342]},
  {Plug.Telemetry, :"-call/2-fun-0-", 4,
   [file: 'lib/plug/telemetry.ex', line: 76]},
  {Enum, :"-reduce/3-lists^foldl/2-0-", 3, [file: 'lib/enum.ex', line: 2396]},
  {Plug.Conn, :run_before_send, 2, [file: 'lib/plug/conn.ex', line: 1690]},
  {Plug.Conn, :send_resp, 1, [file: 'lib/plug/conn.ex', line: 399]},
  {TyperacerWeb.LobbyController, :action, 2,
   [file: 'lib/typeracer_web/controllers/lobby_controller.ex', line: 1]},
  {TyperacerWeb.LobbyController, :phoenix_controller_pipeline, 2,
   [file: 'lib/typeracer_web/controllers/lobby_controller.ex', line: 1]},
  {Phoenix.Router, :__call__, 2, [file: 'lib/phoenix/router.ex', line: 355]},
  {TyperacerWeb.Endpoint, :plug_builder_call, 2,
   [file: 'lib/typeracer_web/endpoint.ex', line: 1]},
  {TyperacerWeb.Endpoint, :"call (overridable 3)", 2,
   [file: 'lib/plug/debugger.ex', line: 136]},
  {TyperacerWeb.Endpoint, :call, 2,
   [file: 'lib/typeracer_web/endpoint.ex', line: 1]},
  {Phoenix.Endpoint.Cowboy2Handler, :init, 4,
   [file: 'lib/phoenix/endpoint/cowboy2_handler.ex', line: 54]},
  {:cowboy_handler, :execute, 2,
   [
     file: '/Users/david/projects/private/BetterTyping/deps/cowboy/src/cowboy_handler.erl',
     line: 37
   ]},
  {:cowboy_stream_h, :execute, 3,
   [
     file: '/Users/david/projects/private/BetterTyping/deps/cowboy/src/cowboy_stream_h.erl',
     line: 306
   ]}
]

This is how i configure it:

application.ex:

...
 Typeracer.PhoenixInstrumenter.setup()
    Typeracer.PipelineInstrumenter.setup()
    Typeracer.RepoInstrumenter.setup()
    Prometheus.Registry.register_collector(:prometheus_process_collector)
    Typeracer.PrometheusExporter.setup()

    :ok =
      :telemetry.attach(
        "prometheus-ecto",
        [:typeracer, :repo, :query],
        &Typeracer.RepoInstrumenter.handle_event/4,
        %{}
      )

    PrometheusPhx.setup()
...

How to remove entry with certain label

Hey,

I am using the Boolean Metric to produce something like this:

test_value{foo="bar"} 0
test_value{foo="baz"} 1

Now I no longer need test_value{foo="baz"} because baz was removed from the system. How can I remove it from the output?

Return value from Prometheus.Metric.Histogram.observe_duration/2

Hi, we're migrating our platform to run on Kubernetes and as part of that we're transitioning from using Graphite to Prometheus for graphing our metrics. We currently use the library Statix to push metrics to our Graphite server and specifically we use https://hexdocs.pm/statix/Statix.html#c:measure/3 to instrument functions where we measure execution time. Statix.measure/3 allows the return value of the instrumented function to be returned, but unless I'm misunderstanding the correct usage, this doesn't appear to be the case for Prometheus.Metric.Histogram.observe_duration/2?

Could you clarify whether it is possible to return the value of the function passed to Prometheus.Metric.Histogram.observe_duration/2?

Startup Time issues

I recently tried to integrate this library, and found the startup time increased around 10 seconds. This 10 seconds even happens on mix test. Not sure why. Any ideas?

Compilation fails on Elixir 1.7.0-rc.1

Hi,

just to let you know: prometheus_ex (currently published version on Hex) seems to fail to compile on Elixir 1.7.0-rc.1 (OTP 20).

Might be related to elixir-lang/elixir#7309
Would love to contribute a fix but I'm not terribly familiar with what the purpose of the Macro is in prometheus_ex :-/

Using float values in Histogram?

Hi, thanks for creating this!

I am confused on how to use float values in Histogram. I want to have my time units in seconds, so I have to keep track float values.

Histogram.observe/2 doesn't accept Float values at all, and Histogram.dobserve/2 seems to be rounding down to zero. Am I doing something wrong?

Invalid value "unknown" for "erlang_vm_logical_processors_available"

When I go to /metrics in my Pheonix app, I get a response containing the following snippet.

# TYPE erlang_vm_logical_processors_available gauge
# HELP erlang_vm_logical_processors_available The detected number of logical processors available to the Erlang runtime system
erlang_vm_logical_processors_available unknown

Prometheus is unable to parse unknown and yields this error message: text format parsing error in line 9: expected float as value, got "unknown". From what I hear, the correct value in this case should be NaN.

Edit: I am using version 1.0.0-alpha9.

:mnesia warning when compiling

Hi! Thanks for the lib!

I'm getting this when compiling:

==> prometheus_ex
Compiling 19 files (.ex)
warning: :mnesia.system_info/1 defined in application :mnesia is used by the current application but the current application does not depend on :mnesia. To fix this, you must do one of:

  1. If :mnesia is part of Erlang/Elixir, you must include it under :extra_applications inside "def application" in your mix.exs

  2. If :mnesia is a dependency, make sure it is listed under "def deps" in your mix.exs

  3. In case you don't want to add a requirement to :mnesia, you may optionally skip this warning by adding [xref: [exclude: [:mnesia]]] to your "def project" in mix.exs

  lib/prometheus/contrib/mnesia.ex:22: Prometheus.Contrib.Mnesia.table_disk_size/2

Here are my versions:

Erlang/OTP 24 [erts-12.3.1] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit]

Mix 1.13.4 (compiled with Erlang/OTP 22)

To fix this you need to add the code below to the mix.exs

  def application do
    [
      applications: [:logger, :prometheus],
      extra_applications: [:mnesia]
    ]
  end

I can open a PR if you don't mind.

Lack of documentation in `Prometheus.Inject`

I would like to provide one, however it seems quite hard to deduce what is happening there.

Reduce needless usage of macros

Such frivolous usage of macros causes problems for end users that wants to check project with dialyzer, instead it will be better to use functions that can be inlined later when needed.

Do not change API over Prometheus.erl

Currently API provided by prometheus_ex for no reason tries to use keyword list as a first argument and do some magic parsing before calling functions from prometheus (Erlang one). Why though? I find Erlang API much cleaner and easier to work with, only gripe is fact that it throws ErlangError instead of Elixir-like ones (but that isn't big problem IMHO).

Solutions that I see are 2:

deprecate prometheus_ex in favour of raw prometheus calls and just don't care about error wrapping
make prometheus_ex API 1:1 mapping of the Erlang one with graceful handling of the errors
make prometheus_ex addition to Erlang API instead of wrapping it, aka provide only new features instead of doing that and at the same time wrapping existing features

How to use in a cluster

Hey,

how would one use this in a cluster of nodes? I don't use it to collect operating system or vm metrics but metrics about my application state. That means every node should give me (kinda) the same metrics.

Has someone done this yet? What is the easiest approach?

fix inconsistency in duration unit since elixir 1.8.0

Since release of elixir 1.8.0 System uses singular for time units.

[System] :seconds, :milliseconds, etc. as time units is deprecated in favor of :second, :millisecond, etc.
https://github.com/elixir-lang/elixir/releases

Prometheus metrics are still using plural, is there a plan to adjust duration units to use singular as well ?

Microseconds getting automatically converted to milliseconds?

Is this expected behavior ?

I declare a metric with a microseconds suffix :

@histogram [
    name: :http_check_duration_microseconds,
    labels: [:target],
    buckets: :default,
    help: "Http check execution time"
  ]

And I then feed it a microseconds value (20481):

Histogram.observe(
      [name: :http_check_duration_microseconds, labels: [target]],
      time
    )

It is then silently converted to milliseconds - but the metric name still indicates microseconds.

http_check_duration_gauge_microseconds{target="http://google.com"} 20.481
# TYPE http_check_duration_microseconds histogram
# HELP http_check_duration_microseconds Http check execution time
http_check_duration_microseconds_bucket{target="http://google.com",le="0.005"} 0
http_check_duration_microseconds_bucket{target="http://google.com",le="0.01"} 0
http_check_duration_microseconds_bucket{target="http://google.com",le="0.025"} 0
http_check_duration_microseconds_bucket{target="http://google.com",le="0.05"} 0
http_check_duration_microseconds_bucket{target="http://google.com",le="0.1"} 0
http_check_duration_microseconds_bucket{target="http://google.com",le="0.25"} 0
http_check_duration_microseconds_bucket{target="http://google.com",le="0.5"} 0
http_check_duration_microseconds_bucket{target="http://google.com",le="1"} 0
http_check_duration_microseconds_bucket{target="http://google.com",le="2.5"} 0
http_check_duration_microseconds_bucket{target="http://google.com",le="5"} 0
http_check_duration_microseconds_bucket{target="http://google.com",le="10"} 0
http_check_duration_microseconds_bucket{target="http://google.com",le="+Inf"} 17
http_check_duration_microseconds_count{target="http://google.com"} 17
http_check_duration_microseconds_sum{target="http://google.com"} 364.862

The code in it's entirity can be seen here

broken link to prometheus.erl

The link for prometheus.erl points to a non-existent Hexdocs page: https://hexdocs.pm/prometheus

Prometheus.Metric breaks when `:application_controller` is busy

There can be other scenarios for why :application_controller would be busy, but the one that I've seen is where you're draining connections while shutting down using a library like https://hexdocs.pm/plug_cowboy/Plug.Cowboy.Drainer.html. Without connection draining when receiving a SIGTERM the endpoint will shut down immediately, killing any current connections. With connection draining listeners on the port are suspended, meaning no more connections are opened, but allow the existing connections to drain, and then (and only then) proceed with shutting down the endpoint.

This means :application_controller asks the application containing the endpoint to shut down and waits for it to be done. While it's waiting, it's completely blocked and can't respond to messages. Depending on how long your draining timeout is, this can be a long time. Prometheus.Metric uses Application.started_applications which sends a message to :application_controller and waits timeout (5 seconds) for a response. While draining connections this will always fail, causing Prometheus.Metric to blow up (this also means Prometheus.PlugExporter blows up when Prometheus tries to scrape). If it's helpful I can set up a repo that reproduces this.

Is it possible to avoid calling Application.started_applications? Or catching the failure?

I may also be missing something, but why is the on_load being called each time a request hits the Prometheus.PlugExporter?

Metrics declared via module attributes do not get registered

I have been running into an issue where I declare metrics via module attributes like @counter and @histogram, but when running my test suite, failures occur because metrics the code attempts to update are not registered. I have worked around this by adding a loop like so to my application startup:

for mod <- Application.spec(:my_app)[:modules] do
  if function_exported?(mod, :__declare_prometheus_metrics__, 0) do
    mod.__declare_prometheus_metrics__()
  end
end

Obviously this workaround is brittle because it relies on knowledge of code generated by the macros in Prometheus.Metric. I'd like to stop using it.

The core problem is that the generated code calls Application.started_applications(), but normal OTP startup loads all application code before starting any applications. This means that the @on_load function will be called before prometheus has started and created its ets tables, and so no metrics will be declared and the function will never run when the application has already started.

Proposed solution: The macros should generate entries to be inserted into the default_metrics env key of the prometheus application. When the application starts, the default metrics are automatically declared. This will work because the application_controller process that owns the application environment table will be started at the time each user application is loaded.

Does that solution sound reasonable? If so, I will send a PR.

Prefer already formatted time value instead of native time

Getting Histograms to work with time values is not straightforward. I never used or converted any time values to native time until I started using this library. I think this is counter intuitive and prone to error, since it is easy to mis-read the documentation.

It will be much more natural to do the conversion to native time inside the library by default, rather than asking the user to do so every time. This way the user can supply an integer/float and the library will do the conversion for you.

I realise this will be a breaking change but I think it will pay off. What do you think?

Missing blog post – move the content to this repo?

Hey Ilya & friends,

As I'm currently revamping my blog I realized that the old links don't work anymore, including the one to my blog post about this library to which you're linking to in the readme file. The current link is here, but even that may change (or disappear altogether). Do you want to grab a copy of it, strip it of the fluff, and put it in a docs folder (or the wiki) in this repo as an extended example/tutorial? I'm happy to PR a Markdown file if that helps.