Comments (7)
@ncabatoff this is an interesting idea. I can't say I have seen a requirement for a thread histogram like this. Maybe there are some use cases but I can't think of any. In my case (outlined in #98), I don't think this would work. Regarding cardinality, I agree this is a concern. In our case we run everything on Kubernetes and our Nginx PIDs have the same lifespan as pods, which we have labels for already because kube-state-metrics
adds the pod name. And these pods typically last for 1 or 2 weeks. So it wouldn't be an issue. If you had an app with dynamic thread pools, then this could be an issue, but then I would say that's not a good use case for thread utilisation metrics. (Overall utilisation and thread pool size might be better.)
So for me, there are two use cases here:
- Individual thread metrics
- Should only be enabled with low thread churn
- Allows monitoring and alerting of resource limited threads
- Aggregate thread metrics
- Could be used with high churn threads
- I'm not sure what the use case is for doing this
from process-exporter.
Can you elaborate on your use case please? Specifically, how would you monitor and alert on resource limited threads? Some examples of the alert conditions?
from process-exporter.
Nginx workers are a single thread and performance degrades severely if the thread is fully utilised. Nginx has a master process and N workers so the intention is to alert if any workers consume 100% cpu for say 10 seconds.
from process-exporter.
In the context of the histogram proposal, assuming we had cpu buckets of say {0.1, 0.25, 0.5, 0.75, 1.0}, could you achieve what you want by alerting on
increase(namedprocess_namegroup_threads_cpuseconds_bucket{le="1", cpumode="user", groupname="nginx"})
>
increase(namedprocess_namegroup_threads_cpuseconds_bucket{le="0.75", cpumode="user", groupname="nginx"})
? In other words, alert if there's at least one thread in the nginx group that's consuming >75% of one core. As I understand it you don't really need to know for the alert which thread is misbehaving, just that one exists.
from process-exporter.
How would that work with a duration? If the alert had a span or for clause to alert if the CPU was high for X seconds, I don't think it would be possible to know it's the same thread with a histogram. I would like to alert and graph the utilisation per WP to check the balancing is ok.
We use cadvisor for container metrics: https://github.com/google/cadvisor/blob/master/docs/storage/prometheus.md. It's a simple counter for CPU e.g. container_cpu_usage_seconds_total
, the same as your cpu_seconds_total
. I think that approach is more intuitive and useful in this case. Histograms scale well for high volume discrete data but they have drawbacks (e.g. you may have to change client code to add buckets if you want to tweak the alert thresholds, that's not ideal) and they don't really fit well with continuous data.
from process-exporter.
I don't think it would be possible to know it's the same thread with a histogram
Ah, sorry, I didn't catch that aspect of your use case. So what I proposed will yield false positives, though it's an open question whether the volume of false positives would be excessive or not. Even if it's not always the same thread pegged at >75% (or whatever, let's assume the bins can be user configurable), it's possible that a useable alert could still be constructed based on these metrics.
I agree histograms have plenty of limitations and problems, but the only alternative I see is doing something I've always resisted, namely exposing the user to potentially unbounded cardinality. I realize that in your particular situation that won't normally happen, but there are enough footguns available here that I'd prefer not to provide this functionality.
I recently learned about https://github.com/zwopir/osquery_exporter, I wonder whether it could work for what you have in mind?
Finally, are you sure you want to be alerting based on an artificial metric that's a proxy for the real issue, rather than on the actual symptom? If you say performance degrades severely when this happens, why not instrument nginx performance and alert based on that, e.g. via https://github.com/hnlq715/nginx-vts-exporter ?
from process-exporter.
Ok, fair enough. Maybe I'll write an exporter instead. I'll take a look at osquery_exporter
too, thanks. Yes, we do have alerts for symptoms, but of course we want to scale before having a problem so that our users don't suffer these symptoms. It's also a case of cost vs risk. I want good enough metrics to scale late, to save cost, but without risking performance too much.
from process-exporter.
Related Issues (20)
- Specifying threads flag in front of config flag causes the config to be ignored HOT 1
- How to differienciate mutile process with same process name? HOT 1
- While process-exporter restart,namedprocess_namegroup_cpu_seconds_total resets to zero
- Exe full and arguments HOT 4
- Remove process group when namedprocess_namegroup_num_procs = 0 HOT 4
- Not getting any namedprocess_namegroup_* metrics from process exporter HOT 1
- Not getting any process metrics from process exporter on Debian 12 (bookworm) HOT 2
- Default value to -62135596800 corresponding to 0001-01-01 when no process found
- Unusual high number of involuntary context switches, how is this number computed?
- metric is not updated HOT 1
- High cardinality of Process Exporter metrics HOT 2
- /net/http/pprof were detected as information disclosure vulnerability
- need update gomod to fix CVE list HOT 2
- Process-exporter new release need with latest go and exporter toolkit HOT 7
- Process Exporter v0.7.11 is missing library GLIBC_2.34 HOT 1
- Failed to start the server: no web listen address or systemd socket flag specified HOT 1
- Why does the speed not change after adding -threads=false?
- Getting UIDs instead of user names HOT 1
- 请求响应过慢
- scrape_partial_errors counts up but no errors in log
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from process-exporter.