The present behaviour is to only publish the aggregated diagnostics at the fixed rate

Publish diagnostics_agg right away upon new warn/error diagnostics about diagnostics HOT 5 OPEN

ros commented on May 27, 2024 2

Publish diagnostics_agg right away upon new warn/error diagnostics

from diagnostics.

Comments (5)

heuristicus commented on May 27, 2024

This sounds like a good idea, but is there any particular advantage to working with /diagnostics_agg rather than /diagnostics? I guess the main thing would be fewer callbacks, and the filtering that you define through the analyzers would be applied so you only see the diagnostics you're interested in.

At the same time, it seems to me that the main purpose of the /diagnostics_agg right now is to provide information to a GUI, which is why the update rate is slow.

I guess what I'm wondering about is whether the purpose of the aggregator is to provide an aggregation of the most up to date diagnostics at any point in time, or just in slices.

from diagnostics.

mikepurvis commented on May 27, 2024

The main consumer of diagnostics_agg is the GUI, but I think its purpose is to supply a single diagnostic snapshot which doesn't require opening a socket to every node which produces diagnostics.

Even for the GUI case though, IMO a new ERROR case should be made visible as soon as possible, rather than waiting as much as a second for the next report to arrive.

from diagnostics.

heuristicus commented on May 27, 2024

Definitely agree with making errors visible ASAP.

What about changing messages that are attached to the diagnostics? Does it matter that the cause of the error is different, but the level is still the same?

For example:

level: 2
name: ''
message: something went wrong...
hardware_id: ''
values: 
  - 
    key: ''
    value: ''
---
level: 2
name: ''
message: something else went wrong...
hardware_id: ''
values: 
  - 
    key: ''
    value: ''
---

Should both of these messages cause an instant update, or just the first one? Does that also apply to when the level is OK?

From my perspective, I don't think it makes sense to update instantly on changed messages in the OK or WARN levels, but it might if the diagnostic is in the ERROR level. It's a pretty small change to make though, in terms of implementation - just a check on an additional field of the message. Doing this might cause updates to be too frequent, for example if your message contains some floating point values which change a lot.

from diagnostics.

mikepurvis commented on May 27, 2024

That seems reasonable to me.

from diagnostics.

trainman419 commented on May 27, 2024

From personal experience, I'd suggest a few adjustments to this:

publishing /diagnostics_agg on change doesn't reduce the latency if the diagnostic_updater library is limiting changes on /diagnostics to 1Hz, so to implement this properly, both would need to updated to publish updates immediately when items go from OK to not OK
Industrial users will want to know the end-to-end latency between a node transitioning to an error state and the diagnostics_agg topic reflecting that
Most industrial systems that have a periodic update (1Hz) and immediate update on change also limit the max publishing rate on change, so that a status item that is oscillating doesn't overwhelm the system. I think this is good idea here as well
I think different users will want different notification behaviors on WARN vs ERROR; it might make sense to parameterize this in the final implementation

from diagnostics.

Publish diagnostics_agg right away upon new warn/error diagnostics about diagnostics HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent