Comments (12)
- Sounds simple, but I'm not sure I understand additional integration comment.
- Seems to still have an issue, in the case where
old_value < value
, it's going to increment by the wholevalue
each time, instead of the difference between the old and current value.
With Solution 1, we should be able to continue to treat the Gauge as a Counter in things like Grafana, rate
should work as before, decreases in the metric would still count as a reset. Counting total CPU time consumed by all jails with a given name should still be a case of sum by(name) (rate(metric[5m]))
, etc.
The HELP
and TYPE
comments for the metrics in the scrape would unfortunately be "wrong" with regards to how the metrics should be used, but we could document their special case.
I'll have a proper look at this some time tomorrow :)
from jail_exporter.
Could not reproduce this for now.
from jail_exporter.
Reopening this as it seems to be a real issue when Poudriere is running in bulk
mode (probably others too, but bulk
hits the resource counters enough to trigger it).
Debugging information thrown out is:
jail_exporter[22120]: thread 'tokio-runtime-worker-0' panicked at 'assertion failed: v >= P::T::from_i64(0)', /home/user/.cargo/registry/src/github.com-1ecc6299db9ec823/prometheus-0.4.1/src/counter.rs:70:9
Which seems to indicate that a counter somewhere is trying to be decremented.
The likely cause of this is the build jails coming and going, but having the same names each time. This means that their rctl
resource usage stats would reset to 0
and once we've done some math on the figures to find out how much we're incrementing by, the result would now be negative.
Lines 204 to 208 in fa4a7fc
A possible solution here is to just take the absolute value after doing that math, and increase the counter by that amount.
from jail_exporter.
Currently testing the absolute value approach.
from jail_exporter.
Taking the absolute values here seems to have solved the issue.
from jail_exporter.
I'm not sure the absolute value really makes sense here, as old_value - new_value
isn't really the amount of CPU time consumed. This becomes clear when we look at the following example:
- Jail
A
(1) consumed 100 seconds of CPU time:jail_cpu_time{name = "A"} 100
- Jail
A
(1) gets restarted. Let the new Jail be denoted asA
(2) - Jail
A
(2) consumes another 10 seconds of CPU time:This would result in the following counter value:// *value == 10 // series.get() == 100 let inc = *value - series.get(); // Inc.abs() is now 90, but should be 10
jail_cpu_time{name = "A"} 190
, while in total only110
seconds were consumed.
I see two solutions here:
- Make
jail_cpu_time
a gauge representing the CPU time consumed between scrapes. This would have the advantage of reflecting a Jail being restarted, but calculating the total CPU time consumed by all jails with a given name would require additional integration. - Handle jail restarts transparently:
This would be a lower bound on the CPU time consumed, as any CPU time consumed by Jail
let increment = match series.get() { old_value if old_value <= *value => *value - old_value, _ => *value, };
A
(1) between the last scrape while it was still running and the restart will be dropped, but at least it would be more accurate than the current solution.
from jail_exporter.
Hm, yeah. That's what I get for trying to think about that issue between meetings :)
from jail_exporter.
You are right, I think I mixed up the match arms. I'll fix it. The basic idea would be:
- if increment is non-negative (current > old value), just add it
- otherwise, the jail must have been resarted between scrapes. In this case, add the current time, as it is a lower bound on the CPU time consumed between the scrapes.
With "integration" I meant sum by(name)
.
from jail_exporter.
otherwise, the jail must have been resarted between scrapes. In this case, add the current time, as it is a lower bound on the CPU time consumed between the scrapes.
Ah, I see what you were getting at. This would only work for the very next scrape though. Imagine these numbers were scraped in this order, the jail_cpu_time
counter is already at 1000
in the exporter:
1020
: new total:2020
- jail is reset on the host,
cputime
value resets to0
on host, still2020
in exporter. 10
: new total:2030
50
: new total:2080
In the last scrape, the whole value of 50
was added, since we have no idea what the previous value was and didn't know we had to deduct 10
.
I'm thinking I'll have to add a value in the lazy_static!
to keep track of the old scrape value.
from jail_exporter.
This commit fixes the cputime
counter via a book keeping variable. A later commit fixes the wallclock
counter in the same way. Tests were added for both of them and the new behaviour passes the tests.
Going to let you check this approach out before (hopefully) closing the bug. :)
from jail_exporter.
You are absolutely right, you do need a bookkeeping variable 👍. Your approach does seem right to me :)
from jail_exporter.
👍
from jail_exporter.
Related Issues (20)
- Document code in Rustdoc compatible way HOT 2
- HTTP basic auth HOT 2
- Add {read,write}{bps,iops} HOT 1
- Support taking configuration from the environment
- HTTP logging via actix middleware HOT 1
- Use concrete ContentTypes from mime crate
- Update to a actix-web 1.0
- type_length_limit must be increased
- Release on crates.io HOT 3
- Document installation in README HOT 1
- Break CLI parsing out of main.rs into own file HOT 1
- Text file output mode HOT 1
- Switch from failure to thiserror
- Update to Actix Web 2.0
- RUSTSEC-2020-0036: failure is officially deprecated/unmaintained
- RUSTSEC-2020-0016: `net2` crate has been deprecated; use `socket2` instead
- RUSTSEC-2020-0056: stdweb is unmaintained
- Doesn't work on 13.0-RELEASE HOT 14
- RUSTSEC-2021-0139: ansi_term is Unmaintained
- RUSTSEC-2023-0040: `users` crate is unmaintained HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jail_exporter.