ros / diagnostics Goto Github PK

View Code? Open in Web Editor NEW

83.0 19.0 167.0 8.66 MB

Packages related to gathering, viewing, and analyzing diagnostics data from robots.

Home Page: https://index.ros.org/p/diagnostics/

License: Other

C++ 65.92% Python 30.52% CMake 3.56%

diagnostics monitoring ros

diagnostics's Introduction

Overview

The diagnostics system collects information about hardware drivers and robot hardware to make them available to users and operators. The diagnostics system contains tools to collect and analyze this data.

The diagnostics system is build around the /diagnostics topic. The topic is used for diagnostic_msgs/DiagnosticArray messages. It contains information about the device names, status, and values.

It contains the following packages:

diagnostic_aggregator: Aggregates diagnostic messages from different sources into a single message.
diagnostic_analysis: Not ported to ROS2 yet #contributions-welcome
diagnostic_common_diagnostics: Predefined nodes for monitoring the Linux and ROS system.
diagnostic_updater: Base classes to publishing custom diagnostic messages for Python and C++.
self_test: Tools to perform self tests on nodes.

Collecting diagnostic data

At the points of interest, i.e. the hardware drivers, the diagnostic data is collected. The data must be published on the /diagnostics topic. In the diagnostic_updater package, there are base classes to simplify the creation of diagnostic messages.

Aggregation

The diagnostic_aggregator package provides tools to aggregate diagnostic messages from different sources into a single message. It has a plugin system to define the aggregation rules.

Visualization

Outside of this repository, there is rqt_robot_monitor to visualize diagnostic messages that have been aggregated by the diagnostic_aggregator.

Diagnostics messages that are not aggregated can be visualized by rqt_runtime_monitor.

Target Distribution

The ros2 branch targets

Humble Hawksbill
Iron Irwini

The ros2-jazzy branch targets

Jazzy Jalisco
Rolling Ridley

License

The source code is released under a BSD 3-Clause license.

diagnostics's People

Contributors

Stargazers

Watchers

Forkers

muratsevim bulwahn chadrockey vikingx shadow-robot mitchellwills scpeters mikepurvis hidof-forks requesttimedout jonbinney garaemon ospreyx heuristicus getcheve otamachan jvlahou codebot magazino yujinrobot caomw progtologist nlamprian guillaumeautran tailosinc familiarquark tno-ivs bponsler jin-myung clearpathrobotics junaidnaseer snrkiwi chenrui2014 mikaelarguedas lukefrasera moriarty maciejmatuszak nilshaukebussas-tomtom theunnamed2 ms-iot flyinskybtx liqi198786 eurogroep dfautomation cwecht vaibhavbhadade kishornaik10 noguchiyukiyasu rohita83 francescodelduchetto haraisao boschresearch g-gemignani karsten1987 twdragon gary-robotics wkhudgins92 roverrobotics-forks zhapupu sundermann ivanpauno ssh666 jacobperron whill nobleo kikass13 hhansen-bdai zhouzhuol densoadas ahcorde copel-bigdata synkar lucasw tobias-fischer awesomebytes uniqrobot peci1 skylerpan reinzor thekobithirdparty airballking mro47 mitsudome-r thomascent pandinosaurus basvolkers hirotaka001 onkelj deepakc-nicn vincentrou airyzf ppedro74 smilerobotics zjf-boy keisukeshima amilcarlucas tier4 kenji-miyake marc-404 wep21

diagnostics's Issues

Rewrite add_analyzer script to C++

As part of our Python 3 migration, the add_analyzer script has come up as a nuisance which would be nice to avoid. Given that the rest of this package is C++, how would we feel about rewriting the script to be a small binary and parameterizing the service name?

The main proviso is that it would still need to shell out to rosparam path/to/my.yaml /namespace to perform the yaml loading, so the startup cost of running that Python program would be paid, but not the memory cost of a long-running Python process.

If there's interest in this, I can send a PR.

Memory size should probably not be a major consideration in this, but there is a slight savings— the python process at idle has a vsize of around 1MB, while the C++ version at idle is 400kb.

some tests fail

Some tests were not building properly (linker issue). I fixed that in a69d762. But some of the tests fail:

[ERROR] [1381942884.706085587]: No analyzers initialzed in AnalyzerGroup /analyzer_loader/analyzers
/data/code/hydro_catkin_ws/src/diagnostics/diagnostic_aggregator/test/analyzer_loader.cpp:56: Failure
Value of: analyzer_group.init(path, nh)
  Actual: false
Expected: true
[  FAILED  ] AnalyzerLoader.analyzerLoading (355 ms)



-- run_tests.py: execute commands
  /usr/bin/cmake -E make_directory /data/code/hydro_catkin_ws/build/test_results/diagnostic_analysis
  /usr/bin/nosetests -P --process-timeout=60 /data/code/hydro_catkin_ws/src/diagnostics/diagnostic_analysis/test/bag_csv_test.py --with-xunit --xunit-file=/data/code/hydro_catkin_ws/build/test_results/diagnostic_analysis/nosetests-test.bag_csv_test.py.xml
E
======================================================================
ERROR: Failure: ImportError (No module named diagnostic_analysis.exporter)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/loader.py", line 390, in loadTestsFromName
    addr.filename, addr.module)
  File "/usr/lib/python2.7/dist-packages/nose/importer.py", line 39, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/usr/lib/python2.7/dist-packages/nose/importer.py", line 86, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/data/code/hydro_catkin_ws/src/diagnostics/diagnostic_analysis/test/bag_csv_test.py", line 51, in <module>
    from diagnostic_analysis.exporter import LogExporter
ImportError: No module named diagnostic_analysis.exporter
-------------------- >> begin captured logging << --------------------
rospy.topics: INFO: topicmanager initialized
--------------------- >> end captured logging << ---------------------

----------------------------------------------------------------------
Ran 1 test in 0.001s

FAILED (errors=1)


[ RUN      ] DiagnosticUpdater.testFrequencyStatus
/data/code/hydro_catkin_ws/src/diagnostics/diagnostic_updater/test/diagnostic_updater_test.cpp:141: Failure
Value of: stat[1].level
  Actual: '\x1' (1)
Expected: 0
within max frequency but reported error
/data/code/hydro_catkin_ws/src/diagnostics/diagnostic_updater/test/diagnostic_updater_test.cpp:142: Failure
Value of: stat[2].level
  Actual: '\x1' (1)
Expected: 0
within min frequency but reported error
[  FAILED  ] DiagnosticUpdater.testFrequencyStatus (521 ms)
[ RUN      ] DiagnosticUpdater.testTimeStampStatus
/data/code/hydro_catkin_ws/src/diagnostics/diagnostic_updater/test/diagnostic_updater_test.cpp:166: Failure
Value of: stat[2].level
  Actual: '\x1' (1)
Expected: 0
now not accepted
[  FAILED  ] DiagnosticUpdater.testTimeStampStatus (0 ms)

and finally, self_test/no_id_selftest never ends... so I could not run the other tests...

Including gtest.h fails on OS X

While looking for the tuple class it somehow assumes that it should include <tr1/tuple> although Clang implements C++11 with tuple being a part of the standard library (so it should be just <tuple>).

Sorry for not gathering more info from the header file for now, I will look into it when I have time.

diagnostic_common_diagnostics/sensors_monitor: Additional sensor types

Is there interest in additional sensor type support for the sensor_monitor node. I have an implementation of a C++ libsensors based node (https://github.com/RIVeR-Lab/computer_sensors) that publishes diagnostic data. It also supports types other than temp and fan (does not currently support voltage explicitly) by adding their properties. The use of the api seems like it would be more reliable than parsing the output of sensors.

Would something like this be accepted in a pull request or would I be better of releasing a standalone package?

Generic Analyser does not process starts_with and remove_prefix on the same name

The Generic Analyser processes starts_with before slashes are replaced with spaces and remove_prefix after the slashes are replaced. This causes issues with the find_and_remove_prefix parameter as it is matched using the slash, but then the prefix is not removed because the slash does not match the space.

For example if a node is in a namespace and has the name /ns/mynode and it generates a status message with the full name ns/mynode: myname

Then you would expect using the following to match and remove the prefix, but it does not

find_and_remove_prefix: 'ns/mynode: '

instead you must use

startswith: 'ns/mynode: '
remove_prefix: 'ns mynode: '

This can be demonstrated by either viewing the /diagnostics_agg topic or using the rqt_gui plugin

Rewrite how test nodes are exported

Given the sum total of issues #16, #24, #27 and #28, it sounds like the way testing nodes are exported to downstream packages needs to change.

Installing anything that is linked against gtest and meant to be used with user-compiled code is a bade idea and likely won't work, because gtest fails when different parts of it are compiled with different flags (ie the node, built on a build farm, and the user's code, built locally).
Users need to have a way to test their plugins. This was previously provided by the diagnostic_analyzer/analyzer_loader node. See #24
Users need to be able to run self tests from within their rostests. This was provided by the self_test/selftest_rostest node. See #16

The best proposed solution I've heard (thanks @wjwwood ! ) is to install the sources for the analyzer_loader and selftest_rostest, and provide an explicit set of cmake macros which will compile and run them as needed. Once that's written, the docs will need to be updated; at least http://wiki.ros.org/self_test , http://wiki.ros.org/diagnostics/Tutorials/Creating%20a%20Diagnostic%20Analyzer and http://wiki.ros.org/diagnostic_aggregator

I've exhausted my budget of employer-funded time to work on this for the next few months. If someone needs this fixed urgently, they'll have to provide a pull request.

diagnostic_aggregator: catkin_make -DCATKIN_ENABLE_TESTING=0 fails

When executing catkin_make -DCATKIN_ENABLE_TESTING=0, it fails with:

Linking CXX executable (...)/diagnostics_ws/devel/lib/diagnostic_aggregator/analyzer_loader
/usr/bin/ld: cannot find -lgtest
collect2: error: ld returned 1 exit status
make[2]: *** [(...)/diagnostics_ws/devel/lib/diagnostic_aggregator/analyzer_loader] Error 1

diagnostic_aggregator in version 1.8.4 needs the gtest library, but this is only discovered for linking when testing is enabled in catkin.

release a new version

Hey Austin,

now with #99 in, can you please do a new release for kinetic and melodic?

Thanks a lot!

Jochen

Stale items are not ignored with discard_stale: true when no status has been published at all

With a configuration like this

analyzers:
  my_stale_item:
    type: diagnostic_aggregator/GenericAnalyzer
    path: Nonexistent1
    find_and_remove_prefix: 'nonexistent1'
  my_stale_item_that_should_be_ignored:
    type: diagnostic_aggregator/GenericAnalyzer
    path: Nonexistent2
    find_and_remove_prefix: 'nonexistent2'
    discard_stale: true
    timeout: 5.0

and with no messages published on /diagnostics, I would expect "Nonexistent1" to show up as stale and "Nonexistent2" to not show up in /diagnostics_agg.

However, both are always present in /diagnostics_agg.
Here's a minimal launch file to reproduce:

<launch>

  <node pkg="diagnostic_aggregator" type="aggregator_node" name="diagnostic_aggregator">
    <rosparam>
      analyzers:
        my_stale_item:
          type: diagnostic_aggregator/GenericAnalyzer
          path: Nonexistent1
          find_and_remove_prefix: 'nonexistent1'
        my_stale_item_that_should_be_ignored:
          type: diagnostic_aggregator/GenericAnalyzer
          path: Nonexistent2
          find_and_remove_prefix: 'nonexistent2'
          discard_stale: true
          timeout: 5.0
    </rosparam>
  </node>

  <node name="$(anon monitor)" pkg="rqt_robot_monitor" type="rqt_robot_monitor"/>

</launch>

Use node interfaces in Updater class

This is necessary for diagnostic updater to work right in a managed ROS 2 node.

Similar to what was done in ros2/geometry2#108

self_test executables are not installed in hydro

The executable files run_selftest and selftest_rostest mentioned in (http://wiki.ros.org/self_test) should be installed in the CMakeLists.txt, so that they can be used via rosrun.

python binding of diagnostic_updater is not available on groovy

groovy CMakeLists.txt of diagnostic_updater does not call catkin_python_setup()
macro.

hydro version does call catkin_python_setup()
macro.

Travis build fails since ca. 2 months

For the Travis build of this repo, they currently seem to fail on

ImportError: "from catkin_pkg.package import parse_package" failed: No module named catkin_pkg.package

This Python package is in the python-catkin-pkg, which is a dependency of python-catkin, which is installed in the .travis-file.

Currently there are 2 PRs failing on that error:

#72 fails at https://travis-ci.org/ros/diagnostics/builds/335289179#L2299
#73 fails at https://travis-ci.org/ros/diagnostics/builds/336956455#L2162 (mine, which triggered me to check out other PRs as well).

Another recent PR does not fail on this error but fails already earlier: #70

Topics published on a frequency lower than 1.0/diagnostic_period are not supported

Looking at TimeStampStatus and FrequencyStatus code (and testing with our hardware), if you use the TopicDiagnostic to diagnose a slow topic (0.3 Hz in our case), then the diagnostic fails.

At the time windows a message has arrived, it correctly reports on the frequency and all of this, but at the other time windows when no message has appeared, the diagnostic reports ERROR with No events recorded. No data since last update..

This is obviously true, but it is not an error state.

Exception in sensors_monitor / parse_sensor_line

I just got this running sensors_monitor.py on a ROS kinetic / Ubuntu Xenial machine:

[ERROR] [1507561973.045543]: Unable to process lm-sensors data
[ERROR] [1507561973.047346]: Traceback (most recent call last):
  File "/etc/robot/ros/lib/diagnostic_common_diagnostics/sensors_monitor.py", line 189, in monitor
    for sensor in parse_sensors_output(get_sensors()):
  File "/etc/robot/ros/lib/diagnostic_common_diagnostics/sensors_monitor.py", line 156, in parse_sensors_output
    s = parse_sensor_line(line)
  File "/etc/robot/ros/lib/diagnostic_common_diagnostics/sensors_monitor.py", line 109, in parse_sensor_line
    [sensor.name, sensor.type] = name.rsplit(" ",1)
ValueError: need more than 1 value to unpack

The output of the sensors command is:

coretemp-isa-0000
Adapter: ISA adapter
Physical id 0:  +36.0°C  (high = +84.0°C, crit = +100.0°C)
Core 0:         +33.0°C  (high = +84.0°C, crit = +100.0°C)
Core 1:         +34.0°C  (high = +84.0°C, crit = +100.0°C)
Core 2:         +32.0°C  (high = +84.0°C, crit = +100.0°C)
Core 3:         +30.0°C  (high = +84.0°C, crit = +100.0°C)

nct6106-isa-0290
Adapter: ISA adapter
in0:            +0.72 V  (min =  +0.00 V, max =  +1.74 V)
in1:            +1.66 V  (min =  +0.00 V, max =  +2.04 V)
in2:            +3.41 V  (min =  +0.00 V, max =  +4.08 V)
in3:            +3.33 V  (min =  +0.00 V, max =  +4.08 V)
in4:            +0.63 V  (min =  +0.00 V, max =  +2.04 V)
in5:            +1.66 V  (min =  +0.00 V, max =  +2.04 V)
in6:            +1.70 V  (min =  +0.00 V, max =  +2.04 V)
in7:            +3.07 V  (min =  +0.00 V, max =  +4.08 V)
in8:            +2.03 V  (min =  +0.00 V, max =  +4.08 V)
fan1:          4545 RPM  (min =    0 RPM)
fan2:          2760 RPM  (min =    0 RPM)
fan3:             0 RPM  (min =    0 RPM)
SYSTIN:         +38.0°C  (high =  +0.0°C, hyst =  +0.0°C)  ALARM
                         (crit low = +127.0°C, crit = +127.0°C)  sensor = thermal diode
AUXTIN:         -12.0°C  (high = +80.0°C, hyst = +75.0°C)
                         (crit low = +127.0°C, crit = +127.0°C)  sensor = thermal diode
PECI Agent 0:   +36.0°C  (high = +80.0°C, hyst = +75.0°C)
                         (crit low = +127.0°C, crit = +127.0°C)
PECI Agent 1:    +0.0°C  (high = +80.0°C, hyst = +75.0°C)
                         (crit low = +127.0°C, crit = +127.0°C)
PCH_CHIP_TEMP:   +0.0°C  
PCH_CPU_TEMP:    +0.0°C  
intrusion0:    ALARM
beep_enable:   disabled

Looks like the parser cannot read some lines.

Publish diagnostics_agg right away upon new warn/error diagnostics

The present behaviour is to only publish the aggregated diagnostics at the fixed rate (default 1Hz):

https://github.com/ros/diagnostics/blob/03e0db006175c3e6157bb50ab38021ebb4995c5b/diagnostic_aggregator/src/aggregator_node.cpp#L50-56

I believe that it would be better to trigger an immediate publish of the aggregated topic when a diagnostic transitions to WARN or ERROR. We have some error reporting mechanisms that currently have to subscribe to /diagnostics, and could instead subscribe to /diagnostics_agg if we knew that a) no error reports would be missed, and b) new error reports would be passed through immediately.

Thoughts?

[Feature request] Add new aggregation rule to diagnostic_aggregator

diagnostic_aggregator initialize analyzers from rosparam and it cannot add new analyzing rule after initialization. If it supports addition of rules, it would be very useful.

It means that if we want to add new rule, we need to run another aggregator and robot_monitor with remapping like /diagnostics -> /diagnostics_perception.

Updater raises std::bad_alloc sporadically on arm64 with O3 optimizations

The issue is a complicated one, but here goes.

I first noticed this issue when I saw that diagnostics messages from arm64 machines sometimes arrive, but only infrequently (between 0% and 10% of the time), and eventually the following message comes from the Diagnostic Aggregator running on amd64 and it appears all messages are dropped.

[ERROR] [1564529964.093868659 /diag_agg] [/tmp/binarydeb/ros-kinetic-roscpp-1.12.14/src/libros/transport_publisher_link.cpp:TransportPublisherLink::onMessageLength:175]: a message of over a gigabyte was predicted in tcpros. that seems highly unlikely, so I'll assume protocol synchronization is lost.

At first, I though it was Endianness, but all machines are Little Endian. There are also C++ nodes which are able to communicate with each other properly on all machines. It also appears as though this only happens with the Diagnostic Updater, not any other topic.

After this, I started running a test: just running roscore and a single test node with the following code.

#include <ros/ros.h>
#include <diagnostic_updater/diagnostic_updater.h>
#include <diagnostic_updater/update_functions.h>

int main(int argc, char** argv)
{
  ros::init(argc, argv, "updater_node");
  double rate = 10.;
  diagnostic_updater::Updater updater;
  diagnostic_updater::FrequencyStatus frequency_status(
    diagnostic_updater::FrequencyStatusParam(&rate, &rate)
  );
  updater.setHardwareID("none");
  updater.add(frequency_status);

  ros::Rate r(rate);
  while (ros::ok())
  {
    frequency_status.tick();
    updater.update();
    ros::spinOnce();
    r.sleep();
  }
}

Everything works fine with amd64, but on arm64, the above issues happen. Additionally, the node just eventually crashes with std::bad_alloc. Here is the backtrace and relevant message that was published (as it looks like the error was with serialization)

#13 0x0000000000419084 in diagnostic_updater::Updater::publish (this=this@entry=0x7fffffecc0, status_vec=std::vector of length 1, capacity 1 = {...})
    at /opt/ros/kinetic/include/diagnostic_updater/diagnostic_updater.h:547
547	        publisher_.publish(msg);
(gdb) list
542	            node_name_.substr(1) + std::string(": ") + iter->name;
543	        }
544	        diagnostic_msgs::DiagnosticArray msg;
545	        msg.status = status_vec;
546	        msg.header.stamp = ros::Time::now(); // Add timestamp for ROS 0.10
547	        publisher_.publish(msg);
548	      }
549	
550	      /**
551	       * Publishes on /diagnostics and reads the diagnostic_period parameter.
(gdb) p msg
$7 = {header = {seq = 0, stamp = {<ros::TimeBase<ros::Time, ros::Duration>> = {sec = 1564612119, nsec = 323701618}, <No data fields>}, frame_id = ""}, status = std::vector of length 1, capacity 1 = {{level = 0 '\000', name = "updater_node_1564612117153179612: Frequency Status", message = "Desired frequency met", hardware_id = "none", 
      values = std::vector of length 7, capacity 7 = {{key = "Events in window", value = "22"}, {key = "Events since startup", value = "22"}, {key = "Duration of window (s)", value = "2.100290"}, {key = "Actual frequency (Hz)", value = "10.474742"}, {key = "Target frequency (Hz)", value = "10.000000"}, {
          key = "Minimum acceptable frequency (Hz)", value = "9.000000"}, {key = "Maximum acceptable frequency (Hz)", value = "11.000000"}}}}}
(gdb) bt
#0  0x0000007fb7a04528 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x0000007fb7a059e0 in __GI_abort () at abort.c:89
#2  0x0000007fb7bde254 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/aarch64-linux-gnu/libstdc++.so.6
#3  0x0000007fb7bdbdc4 in ?? () from /usr/lib/aarch64-linux-gnu/libstdc++.so.6
#4  0x0000007fb7bdbe10 in std::terminate() () from /usr/lib/aarch64-linux-gnu/libstdc++.so.6
#5  0x0000007fb7bdc0d4 in __cxa_throw () from /usr/lib/aarch64-linux-gnu/libstdc++.so.6
#6  0x0000007fb7bdc6d8 in operator new(unsigned long) () from /usr/lib/aarch64-linux-gnu/libstdc++.so.6
#7  0x000000000040f840 in ros::serialization::serializeMessage<diagnostic_msgs::DiagnosticArray_<std::allocator<void> > > (message=...) at /opt/ros/kinetic/include/ros/serialization.h:795
#8  0x000000000040d60c in boost::_bi::list1<boost::reference_wrapper<diagnostic_msgs::DiagnosticArray_<std::allocator<void> > const> >::operator()<ros::SerializedMessage, ros::SerializedMessage (*)(diagnostic_msgs::DiagnosticArray_<std::allocator<void> > const&), boost::_bi::list0> (f=<optimized out>, a=<synthetic pointer>..., 
    this=<optimized out>) at /usr/include/boost/function/function_template.hpp:129
#9  boost::_bi::bind_t<ros::SerializedMessage, ros::SerializedMessage (*)(diagnostic_msgs::DiagnosticArray_<std::allocator<void> > const&), boost::_bi::list1<boost::reference_wrapper<diagnostic_msgs::DiagnosticArray_<std::allocator<void> > const> > >::operator() (this=<optimized out>) at /usr/include/boost/bind/bind.hpp:893
#10 boost::detail::function::function_obj_invoker0<boost::_bi::bind_t<ros::SerializedMessage, ros::SerializedMessage (*)(diagnostic_msgs::DiagnosticArray_<std::allocator<void> > const&), boost::_bi::list1<boost::reference_wrapper<diagnostic_msgs::DiagnosticArray_<std::allocator<void> > const> > >, ros::SerializedMessage>::invoke (
    function_obj_ptr=...) at /usr/include/boost/function/function_template.hpp:138
#11 0x0000007fb7efc8b0 in ros::TopicManager::publish(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::function<ros::SerializedMessage ()> const&, ros::SerializedMessage&) () from /opt/ros/kinetic/lib/libroscpp.so
#12 0x000000000040fe14 in ros::Publisher::publish<diagnostic_msgs::DiagnosticArray_<std::allocator<void> > > (this=this@entry=0x7fffffee88, message=...) at /usr/include/c++/8/new:169
#13 0x0000000000419084 in diagnostic_updater::Updater::publish (this=this@entry=0x7fffffecc0, status_vec=std::vector of length 1, capacity 1 = {...}) at /opt/ros/kinetic/include/diagnostic_updater/diagnostic_updater.h:547
#14 0x000000000040c5ac in diagnostic_updater::Updater::force_update (this=0x7fffffecc0) at /opt/ros/kinetic/include/diagnostic_updater/diagnostic_updater.h:440
#15 diagnostic_updater::Updater::update (this=0x7fffffecc0) at /opt/ros/kinetic/include/diagnostic_updater/diagnostic_updater.h:390
#16 main (argc=<optimized out>, argv=<optimized out>) at /ws/src/test/src/updater_node.cpp:20

It crashes after ~30 seconds but seems to do so more quickly if multiple of the nodes are running.

As specified in the title, this only happens with GCC optimization level 3 (compiling with -O3 or CMAKE_BUILD_TYPE=Release). I tried with -O2 and it appears to work fine.

I doubt this can be easily fixed and I'm not 100% sure if the issue is within this repo or ros_comm, but any ideas would be greatly appreciated. We really would like to use -O3 throughout our code for improved performance.

Adding Key/Value pairs to diagnostic messages that are specified in the analyzer yaml file.

Any way to, or interest in, being able to add key/value pairs to diagnostic status messages that are specified in the analyzer yaml file. I basically want to "tag" a group of diagnostic messages and use that information in some later processing.

Something like the following:

analyzers:
  battery:
    type: ...
    path: ...
    find_and_remove_prefix: ...
    tags:
      - key1: value1
      - key2: value2

Release into Indigo

@bricerebsamen

Will you be able to release into Indigo soon? Looks like most of the hardware drivers are blocked here by diagnostic_updater.

Thanks!

Chad

diagnostic_updater swallows first character of name parameter

This bug cost me a few hours some years ago when I first ran into it (and didn't get to the bottom of it). It just cost me a few more, when a colleague reviewing my code that reads DiagnosticStatus messages in my custom DiagnosticAnalyzer asked why the name of the message I was matching to was missing the first character.

The problem is at line 542 of diagnostic_updater.h. It appears there was an original intent to remove the leading "/" from the node name, which provides the default intializer for the first part of the message name. However, if I provide a custom initializer name for the diagnostic_updater, line 542 removes its first character.

One can ask, "why specify the name"? No special reason - I didn't realize the normal procedure is to just instantiate the updater without arguments and it would choose the node name for the first part of diagnostic_updater message names. There's no documentation that says how the message name is generated. Using the default constructor initializers will be my fix.

This is not a major bug, but at a minimum it ought to be documented. I think it's probably too dangerous to change the behavior - it would break the hack I previously put in by adding a space in front of the updater name. A less dangerous fix might be to change the default intializer to ros::this_node.getName().substr(1) and remove the .substr at line 542, but that would still break hacks that pre-padded the name.

It is disappointing that there's no tutorial - only an example.cpp that is quite unprofessional in its variable/method names & data (Ex. stat.add("Stupidicity of this updater", 1000.);). I would be willing to turn example.cpp into a C++ tutorial, but would welcome a reviewer, or maybe a python co-contributor. example.cpp does seem to have most of the words you'd want in a wiki tutorial. This is where we could explain the bug/feature in the constructor. Also, I've seen somewhere where compilable code had tutorial text embedded in it such that you could check that it builds and also have it serve as a tutorial, but I don't remember where.

Thoughts?

seft_test check if catkin test enable

Hi,

Can you update hydro release for selft_test CMake so that it checks if Catkin test is enabled.

I need to cross compile selft_tes for usb_cam for Angstrom but I need it to skip the test.

Thanks!

fix reference to example.cpp in manifest.dox file

Currently http://docs.ros.org/groovy/api/diagnostic_updater/html/index.html links to the example.cpp file from rosconsole.

The reference to example.cpp is not unique since other packages which this package depends on also contain a file with that name. Therefore you need to reference the file with its relative location from the manifest.dox file: https://github.com/ros/diagnostics/blob/groovy-devel/diagnostic_updater/mainpage.dox#L25

Example uses of these classes can be found in \ref src/example.cpp.

ros-kinetic-diagnostics-aggregator deb missing from xenial 16.04 on shadow-fixed

I'm running some pre-release tests and noticed it is missing for some reason.

http://packages.ros.org/ros-shadow-fixed/ubuntu/pool/main/r/ros-kinetic-diagnostic-aggregator/

specifically,

amd64.deb for xenial

HeaderlessTopicDiagnostic should clean up after itself

The HeaderlessTopicDiagnostic object registers a callback to the Updater class that is not removed, when HeaderlessTopicDiagnostic goes out of scope. The next update() then leads to a segfault.

Here is code to reproduce the issue:

using namespace diagnostic_updater;
Updater updater;
double min_freq{0.}, max_freq{std::numeric_limits<double>::infinity()};
auto diag = std::make_unique<TopicDiagnostic>( "topic_name", updater,  
                    FrequencyStatusParam(&min_freq, &max_freq, 0, 5), TimeStampStatusParam(0, 1));
updater.force_update(); // ok
diag.reset();
updater.force_update(); // segfault

Operation level from diagnostic_aggregator not consistent

Though very rarely, I've seen /diagnostic_agg msgs like below. Look at "level". Level of /Devices/IMU is 2 (ie. ERROR) but all of its sub devices show level 0 (OK).

(At Willow, you may be able to reproduce with prl).

    level: 2
    name: /Devices/IMU
    message: Expected 4, found 3
    hardware_id: ''
    values: 
      - 
        key: imu_node: Calibration Status
        value: Gyro is calibrated
      - 
        key: imu_node: Frequency Status
        value: Desired frequency met
      - 
        key: imu_node: IMU Status
        value: IMU is running
  - 
    level: 0
    name: /Devices/IMU/Calibration Status
    message: Gyro is calibrated
    hardware_id: Inertia-Link_4200-4132
    values: 
      - 
        key: X bias
        value: -0.0105521
      - 
        key: Y bias
        value: -0.0115087
      - 
        key: Z bias
        value: 0.00218787
  - 
    level: 0
    name: /Devices/IMU/Frequency Status
    message: Desired frequency met
    hardware_id: Inertia-Link_4200-4132
    values: 
      - 
        key: Events in window
        value: 503
      - 
        key: Events since startup
        value: 4802787
      - 
        key: Duration of window (s)
        value: 5.029089
      - 
        key: Actual frequency (Hz)
        value: 100.018110
      - 
        key: Target frequency (Hz)
        value: 100.000000
      - 
        key: Minimum acceptable frequency (Hz)
        value: 95.000000
      - 
        key: Maximum acceptable frequency (Hz)
        value: 105.000000
  - 
    level: 0
    name: /Devices/IMU/IMU Status
    message: IMU is running
    hardware_id: Inertia-Link_4200-4132
    values: 
      - 
        key: Device
        value: /etc/ros/sensors/imu
      - 
        key: TF frame
        value: imu_link
      - 
        key: Error count
        value: 0
      - 
        key: Excessive delay
        value: 0

diagnostic from debian ros-groovy-diagnostic-aggregator/precise uptodate 1.7.7-0precise-20121205-0830-+0000

FrequencyStatusParam is not thread safe

The Struct is storing pointers to the min_freq_ and max_freq_ that are being accessed from the FrequencyStatus class using scoped_locks protecting only the object itself but not the data that can be altered using the struct.
Also, the data are not checked prior being dereferenced, but I guess this is not as important since it will most likely cause problems at the start time compared to the thread safety issue that could cause problems at any point of the lifetime of the application.

Release into melodic

It looks like all of the dependencies for diagnostics are available in Melodic, so it would be great to get this released. Thanks in advance!

diagnostic_updater needs to check if the diagnostic_period parameter exists

The current implementation of diagnostic_updater::Updater reads the parameter diagnostic_period from the parameter server without checking if it exists. If it does not exist, the getParamCached method returns 0.0 and update will run every time.

GenericAnalyzer also removes prefix from path

I am not sure, if this is a real issue or something I am doing wrong.

My config is as follows:

pub_rate: 1.0
base_path: ""
analyzers:
   lasers:
     type: GenericAnalyzer
     path: Laser
     find_and_remove_prefix: 'Laser'

What I see in my robot_monitor is that all information about laser is published inside the '/' namespace.

When I then change the config to this: (please note the change from Laser to laser)

pub_rate: 1.0
base_path: ""
analyzers:
   lasers:
     type: GenericAnalyzer
     path: Laser
     find_and_remove_prefix: 'laser'

Everything gets published as expected, saying the category is called Laser and all messages are trimmed without the laser prefix.

Can anybody confirm that?

AnalyzerGroup is using deprecated pluglinlib code

/home/jbohren/versioned/ros/maintain_catkin/ws/src/diagnostics/diagnostic_aggregator/src/analyzer_group.cpp: In member function ‘virtual bool diagnostic_aggregator::AnalyzerGroup::init(std::string, const ros::NodeHandle&)’:
/home/jbohren/versioned/ros/maintain_catkin/ws/src/diagnostics/diagnostic_aggregator/src/analyzer_group.cpp:122:62: warning: ‘T* pluginlib::ClassLoader<T>::createClassInstance(const string&, bool) [with T = diagnostic_aggregator::Analyzer, std::string = std::basic_string<char>]’ is deprecated (declared at /opt/ros/groovy/include/pluginlib/class_loader_imp.h:236) [-Wdeprecated-declarations]

pure virtual call in ros::ServicePublication::drop()

I tried the modificated version of official example for self_test, but a pure virtual call is occured during the destruction of self_test object.

The modified main function is:

185│ main(int argc, char** argv)
186│ {
187│   ros::init(argc, argv, "my_node");
188│
189│   MyNode *n = new MyNode();
190│
191│   //n.spin();
192│
193├>  delete n;
194│
195│   return(0);
196│ }

the problem is occured in the line no 193.

(gdb) 
pure virtual method called
terminate called without an active exception

Program received signal SIGABRT, Aborted.
0x00007ffff16e7cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007ffff16e7cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff16eb0d8 in __GI_abort () at abort.c:89
#2  0x00007ffff1ff36b5 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007ffff1ff1836 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff1ff1863 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007ffff1ff233f in __cxa_pure_virtual () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff4715adc in ros::ServicePublication::drop() () from /opt/ros/indigo/lib/libroscpp.so
#7  0x00007ffff4782d20 in ros::ServiceManager::unadvertiseService(std::string const&) () from /opt/ros/indigo/lib/libroscpp.so
#8  0x00007ffff4728fe4 in ros::ServiceServer::Impl::unadvertise() () from /opt/ros/indigo/lib/libroscpp.so
#9  0x00007ffff4729087 in ros::ServiceServer::Impl::~Impl() () from /opt/ros/indigo/lib/libroscpp.so
#10 0x00007ffff4729662 in boost::detail::sp_counted_impl_p<ros::ServiceServer::Impl>::dispose() () from /opt/ros/indigo/lib/libroscpp.so
#11 0x00007ffff4729569 in ros::ServiceServer::~ServiceServer() () from /opt/ros/indigo/lib/libroscpp.so
#12 0x00000000004264ee in self_test::TestRunner::~TestRunner (this=0x6583d0) at /opt/ros/indigo/include/self_test/self_test.h:68
#13 0x0000000000423dee in MyNode::~MyNode (this=0x6583d0) at /home/krz/catkin_ws/src/ethon/ethon_node/src/rower_test.cpp:41
#14 0x0000000000422efa in main (argc=1, argv=0x7fffffffd9b8) at /home/krz/catkin_ws/src/ethon/ethon_node/src/rower_test.cpp:193

Access to analyzers in a subclass of AnalyzerGroup

For an advanced usage of AnalyzerGroup class, I need to reimplement some of its public methods, but the big problem is reimplementing its virtual methods while I don't have access to the main class member which is std::vector<boost::shared_ptr<Analyzer> > analyzers_ doesn't really make sense and kind of impossible.
Is there any specific reason for defining analyzers_ as private and not protected (or a getter function for it)?
Currently, as a quick-fix, I reimplemented the addAnalyzer method and stored the analyzers in an additional std::vector<boost::weak_ptr<Analyzer>>.

discard_stale parameter ignored in rqt_robot_monitor

I'm using ROS Indigo (Debian packages) on Ubuntu 14.04 and I'm trying to use the discard_stale parameter for a number of GenericAnalyzers. However, even when an analyzer has the discard_stale parameter set to true, the Stale status message appears and persists in the rqt_robot_monitor display.

I have attached a screen shot rqt_robot_monitor showing Stale state of the (non-existent) Base Controller. And here is my diagnostics.yaml file:

pub_rate: 1.0 # Optional
base_path: '' # Optional, prepended to all diagnostic output

analyzers:
  pub_frequency:
    type: GenericAnalyzer
    path: 'Pub Frequency'
    discard_stale: true
    timeout: 5.0
    discard_stale: true
    contains: 'freq'
  sensors:
    type: GenericAnalyzer
    path: 'Sensors'
    discard_stale: true
    timeout: 5.0
    contains: '_sensor'
  joints:
    type: GenericAnalyzer
    path: 'Joints'
    discard_stale: true
    timeout: 1.0
    regex: '.*_joint$'
  base_controller:
    type: GenericAnalyzer
    path: 'Base Controller'
    discard_stale: true
    timeout: 1.0
    contains: 'base_controller'

docs are broken for hydro

hydro: Cannot load information on name: diagnostic_updater, distro: hydro, which means that it is not yet in our index. Please see this page for information on how to submit your repository to our index.

hardware id and task name disappear

Hi,

In the diagnostic_updater::Updater::add callback function ,if assign an value to diagnostic_updater::DiagnosticStatusWrapper &stat, like below.
the Hardware id and task name will be empty in rqt_robot_monitor.
I have this problem in Ubuntu 14.04 and ROS Indigo.

example:

class DummyClass
{
public:

   DummyClass() {
      my_stat.setHardwareID("none"); 
      my_stat.add("test",1);
   }
    produce_diagnostics(diagnostic_updater::DiagnosticStatusWrapper &stat)
   {
      stat = my_stat;
    }   

   diagnostic_updater::DiagnosticStatusWrapper  my_stat;
};
main(){
DummyClass dc;
 ros::NodeHandle nh;

diagnostic_updater::Updater;
updater.add("Method updater", &dc, &DummyClass::produce_diagnostics);
while (nh.ok())
      {
        ros::Duration(0.1).sleep();
        updater.update();
      }

      return 0; 
}

rosdistro does not point to this repo (github.com/ros/diagnostics/)

See https://github.com/ros/rosdistro/blob/master/releases/groovy.yaml#L71
If it's old or just wrong repo, I can pull request. Just need a confirmation.

diagnostics_common_diagnostics doesn't build on OSX

As seen on ROS answers: http://answers.ros.org/question/199874/no-definition-of-libsensors4-dev-for-os-osx/

Doesn't build on OSX because libsensors4-dev isn't available on OSX.

@mitchellwills thoughts on how to fix this? It's easy enough to make the build process skip the libsensors node if libsensors isn't available, but we'd still have to figure out how to specify the dependency properly, because ROS doesn't support the notion of system-specific or optional dependencies.

Thoughts?

Lunar release

Hi @trainman419 and diagnostics maintainers,

As you may know the next ROS release Lunar Loggerhead is around the corner 🎉
Is it possible to release diagnostics on ROS Lunar? Being a low level package this is currently preventing many repositories from being released.

If you don't have time to make a new release, please release the current kinetic version into Lunar by running bloom-release diagnostics -r lunar -t lunar --new-track.
Thanks!

diagnostic_analysis analyzer_loader is not installed

As seen on ROS answers: http://answers.ros.org/question/185155/error-in-tutorial-creating-a-diagnostic-analyzer/

It isn't possible to follow this tutorial: http://wiki.ros.org/diagnostics/Tutorials/Creating%20a%20Diagnostic%20Analyzer because the test node isn't built or included in the binary package.

Division by zero if time does not change between updates

To determine the update frequency, the diagnostic_updater divides by the time since the last update here: https://github.com/ros/diagnostics/blob/indigo-devel/diagnostic_updater/include/diagnostic_updater/update_functions.h#L174

If the time has not changed since then (happend to me when playing a rosback with use_sim_time), we divide by zero.

diagnostic_analysis nodes are not installed

The export_csv.py and sparse_csv.py nodes are not installed with the apt builds of the diagnostic_analysis package.

The add_analyzer fails to clean up...sometimes

When add_analyzer leaves, it utilises the bond mechanism to shutdown the bond, triggering an unloading of the analyzers on the aggregator side.

Looks like the bondpy mechanism however, isn't reliably ensuring the aggregator gets triggered. This results in an error message when you reload the same analyzers that were not unloaded:

[ERROR] [WallTime: 1472626077.441982] add_analyzers did not add any analyzers to diagnostic aggregator: Requested load from namespace /diagnostics/navi_common_diagnostic_analyzers which is already in use

Difficult to reproduce with small tests. Right now I'm only getting it on a robot with alot of software running. Even there, it does not occur 100%.

diagnostics_updater does not support diagnostic_period < 1

I'm not sure about the python version, but the diagnostics_updater in c++ crashes with

std::runtime_error "Time is out of dual 32-bit range"

if the diagnostic_period parameter is < 1
The reason lies in diagnostics_updater::update_diagnostic_period() where ros::Duration overflows if period_ < old_period:
https://github.com/ros/diagnostics/blob/groovy-devel/diagnostic_updater/include/diagnostic_updater/diagnostic_updater.h#L520

`rosdiagnostic echo` command line

Viewing aggregated diagnostics can be done using rqt_robot_monitor application. This is great but requires launching RQT in a Xwindows session. Instead, the rosdiagnostic command very quickly let you visualize the active diagnostic on any robot without requiring heavy duty Xwindow library.

missing dependency causes broken parallelbuild

see
ros/catkin#264

diagnostic_updater python API is not exported

catkin_python_setup() and corresponding setup.py are missing

Diagnostics in Nodelets

The C++ diagnostic_updater does not accept a node handle in the constructor so it cannot be used correctly in a nodelet.
The same applies to the self_test package (the constructor takes a node handle, but then it is ignored).

Both packages should take a node handle and a private node handle so that names can be resolved correctly in a nodelet.

Bond broken prevents adding diagnostics

When adding diagnostics at runtime using either the script manual_diag.py given here http://wiki.ros.org/diagnostics/Tutorials/Adding%20Analyzers%20at%20Runtime or add_analyzers, the aggregator fails to properly update the group and leaves the added diagnostics under /Other/ even after wating for some time. The add_analyzers reports that the service call succeeded.

Looking at the logs, the aggregator gives:
[ WARN] [1519235533.166823508]: Bond for namespace /startup_analyzers was broken
[ WARN] [1519235533.171578828]: Broken bond tried to remove an analyzer which didn't exist.

I don't see any reasons why as the node adding diags (/startup_analyzers) is still running and I can see /diagnostics_agg/bond pub and sub when rostopic info. I'm running all of this on localhost.
The problem can be reproduced when following the tutorial http://wiki.ros.org/action/fullsearch/diagnostics/Tutorials/Adding%20Analyzers%20at%20Runtime?action=fullsearch&context=180&value=linkto%3A%22diagnostics%2FTutorials%2FAdding+Analyzers+at+Runtime%22#Overview at least on Kinetic@Ubuntu 16.04

There is only one node adding diags. Any idea or is that directly a Bond issue ?

creation of new branch for ros 2 development

can you please create new branch for ros2 development so that , we can make PR request for ros 2 migrated package

ros / diagnostics Goto Github PK

diagnostics's Introduction

Overview

Collecting diagnostic data

Aggregation

Visualization

Target Distribution

License

diagnostics's People

Contributors

Stargazers

Watchers

Forkers

diagnostics's Issues

Recommend Projects

Recommend Topics

Recommend Org