Code Monkey home page Code Monkey logo

tracetools_analysis's Introduction

tracetools_analysis

codecov

Analysis tools for ros2_tracing.

Note: make sure to use the right branch, depending on the ROS 2 distro: use rolling for Rolling, humble for Humble, etc.

Trace analysis

After generating a trace (see ros2_tracing), we can analyze it to extract useful execution data.

Commands

Then we can process a trace to create a data model which could be queried for analysis.

$ ros2 trace-analysis process /path/to/trace/directory

Note that this simply outputs lightly-processed ROS 2 trace data which is split into a number of pandas DataFrames. This can be used to quickly check the trace data. For real data processing/trace analysis, see Analysis.

Since CTF traces (the output format of the LTTng tracer) are very slow to read, the trace is first converted into a single file which can be read much faster and can be re-used to run many analyses. This is done automatically, but if the trace changed after the file was generated, it can be re-generated using the --force-conversion option. Run with --help to see all options.

Analysis

The command above will process and output raw data models. We need to actually analyze the data and display some results. We recommend doing this in a Jupyter Notebook, but you can do this in a normal Python file.

$ jupyter notebook

Navigate to the analysis/ directory, and select one of the provided notebooks, or create your own!

For example:

from tracetools_analysis.loading import load_file
from tracetools_analysis.processor import Processor
from tracetools_analysis.processor.cpu_time import CpuTimeHandler
from tracetools_analysis.processor.ros2 import Ros2Handler
from tracetools_analysis.utils.cpu_time import CpuTimeDataModelUtil
from tracetools_analysis.utils.ros2 import Ros2DataModelUtil

# Load trace directory or converted trace file
events = load_file('/path/to/trace/or/converted/file')

# Process
ros2_handler = Ros2Handler()
cpu_handler = CpuTimeHandler()

Processor(ros2_handler, cpu_handler).process(events)

# Use data model utils to extract information
ros2_util = Ros2DataModelUtil(ros2_handler.data)
cpu_util = CpuTimeDataModelUtil(cpu_handler.data)

callback_symbols = ros2_util.get_callback_symbols()
callback_object, callback_symbol = list(callback_symbols.items())[0]
callback_durations = ros2_util.get_callback_durations(callback_object)
time_per_thread = cpu_util.get_time_per_thread()
# ...

# Display, e.g., with bokeh, matplotlib, print, etc.
print(callback_symbol)
print(callback_durations)

print(time_per_thread)
# ...

Note: bokeh has to be installed manually, e.g., with pip:

$ pip3 install bokeh

Design

See the ros2_tracing design document, especially the Goals and requirements and Analysis sections.

Packages

ros2trace_analysis

Package containing a ros2cli extension to perform trace analysis.

tracetools_analysis

Package containing tools for analyzing trace data.

See the API documentation.

tracetools_analysis's People

Contributors

christophebedard avatar iluetkeb avatar nightduck avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

tracetools_analysis's Issues

test_convert_time_columns fails due to a pandas bug

tracetools_analysis.test_data_model_util.TestDataModelUtil test_convert_time_columns currently fails on Noble due to a bug in pandas: pandas-dev/pandas#55730

FAILED test/tracetools_analysis/test_data_model_util.py::TestDataModelUtil::test_convert_time_columns - AttributeError: 'numpy.dtypes.DateTime64DType' object has no attribute 'unit'

It was fixed: pandas-dev/pandas#55812

However, the fix is only available in version >=2.2.0, and we're stuck with version 2.1.4 on Noble using apt: https://packages.ubuntu.com/noble/python3-pandas.

callback_analysis.ipynb fails to run with latest version of bokeh

Using Bokeh 3.2.0 and python 3.10 on Ubuntu 22.04, the following error is seen in cell 6:

AttributeError: unexpected attribute 'plot_width' to figure, similar attributes are outer_width, width or min_width

Similar errors are given for plot_height. And DatetimeTickFormatter objects no longer accept lists of formatting strings as of Bokeh 3.0

`callback_duration.ipynb` works but `lifecycle_states.ipynb` and `memory_usage.ipynb` fails

With the following setting, callback_duration.ipynb works but lifecycle_states.ipynb and memory_usage.ipynb

    # Set up the Trace action with a session name that uniquely includes a timestamp
    ros2_tracing = Trace(
        condition=IfCondition(enable_tracing),
        session_name=f'ros_tracing_navigation_session_{timestamp}',
        events_kernel=[], 
        events_ust=['ros2:*'],  # Tracks all user-space ROS 2 events
    )

The errors of lifecycle_states.ipynb are as below:

data_util = Ros2DataModelUtil(handler.data)

state_intervals = data_util.get_lifecycle_node_state_intervals()
for handle, states in state_intervals.items():
    print(handle)
    print(states.to_string())

output_notebook()
psize = 450
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File /usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py:3805, in Index.get_loc(self, key)
   3804 try:
-> 3805     return self._engine.get_loc(casted_key)
   3806 except KeyError as err:

File index.pyx:167, in pandas._libs.index.IndexEngine.get_loc()

File index.pyx:196, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:2606, in pandas._libs.hashtable.Int64HashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:2630, in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 0

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[5], line 3
      1 data_util = Ros2DataModelUtil(handler.data)
----> 3 state_intervals = data_util.get_lifecycle_node_state_intervals()
      4 for handle, states in state_intervals.items():
      5     print(handle)

File ~/src/ag_navigation/amr_navigation/external/tracetools_analysis/tracetools_analysis/analysis/../tracetools_analysis/utils/ros2.py:540, in Ros2DataModelUtil.get_lifecycle_node_state_intervals(self)
    536 node_creation_timestamp = self.data.nodes.loc[lifecycle_node_handle, 'timestamp']
    538 # Add initial and final timestamps
    539 # Last states has an unknown end timestamp
--> 540 first_state_label = transitions.loc[0, 'start_label']
    541 last_state_label = transitions.loc[transitions.index[-1], 'goal_label']
    542 transitions.loc[-1] = ['', first_state_label, node_creation_timestamp]

File /usr/local/lib/python3.10/dist-packages/pandas/core/indexing.py:1183, in _LocationIndexer.__getitem__(self, key)
   1181     key = tuple(com.apply_if_callable(x, self.obj) for x in key)
   1182     if self._is_scalar_access(key):
-> 1183         return self.obj._get_value(*key, takeable=self._takeable)
   1184     return self._getitem_tuple(key)
   1185 else:
   1186     # we by definition only have the 0th axis

File /usr/local/lib/python3.10/dist-packages/pandas/core/frame.py:4221, in DataFrame._get_value(self, index, col, takeable)
   4215 engine = self.index._engine
   4217 if not isinstance(self.index, MultiIndex):
   4218     # CategoricalIndex: Trying to use the engine fastpath may give incorrect
   4219     #  results if our categories are integers that dont match our codes
   4220     # IntervalIndex: IntervalTree has no get_loc
-> 4221     row = self.index.get_loc(index)
   4222     return series._values[row]
   4224 # For MultiIndex going through engine effectively restricts us to
   4225 #  same-length tuples; see test_get_set_value_no_partial_indexing

File /usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key)
   3807     if isinstance(casted_key, slice) or (
   3808         isinstance(casted_key, abc.Iterable)
   3809         and any(isinstance(x, slice) for x in casted_key)
   3810     ):
   3811         raise InvalidIndexError(key)
-> 3812     raise KeyError(key) from err
   3813 except TypeError:
   3814     # If we have a listlike key, _check_indexing_error will raise
   3815     #  InvalidIndexError. Otherwise we fall through and re-raise
   3816     #  the TypeError.
   3817     self._check_indexing_error(key)

KeyError: 0

'Series' object has no attribute 'startswith'

The example sample_data/converted_pingpong works as expected.
But my collected data report 'Series' object has no attribute 'startswith' during callback_symbols = data_util.get_callback_symbols()`

One difference is that the version of my collected data is 4.1.1 but the example is 2.0.0.

 [100%] [Ros2Handler]
====================ROS 2 DATA MODEL===================
Contexts:
                           timestamp     pid version
context_handle                                      
140166526148480  1713864810218652498  446420   4.1.1
94917505998736   1713864810250558484  446466   4.1.1
94383685947248   1713864810258969767  446478   4.1.1

Launching ROS2 trace from a launch file results in large tracing files (over 30 GB)

ros2 trace ends with 200 MB tracing logs and tracetool_analysis can visualize them.

However, launching ROS2 trace from a launch file results in large files (over 15 GB). When processing log files larger than 15 GB, my PC takes approximately 30 minutes to handle the files. However, the process is automatically terminated, as shown in the terminal output below:

ros2 trace-analysis process /logs/navigation_session_20240507_0907
converting trace directory: /logs/navigation_session_20240507_0907
Killed

Can I select only essential events to make the callback_duration.ipynb work? With all events, the logs can easily exceed 50GB after just a 10-minute run.

RequiredEventNotFoundError: missing events: {'CpuTimeHandler': {'sched_switch'}} on humble branch

When trying the jupyter notebook example from the README.md I get the following error:

RequiredEventNotFoundError Traceback (most recent call last)
Cell In[1], line 15
12 ros2_handler = Ros2Handler()
13 cpu_handler = CpuTimeHandler()
---> 15 Processor(ros2_handler, cpu_handler).process(events)
17 # Use data model utils to extract information
18 # ros2_util = Ros2DataModelUtil(ros2_handler.data)
19 # cpu_util = CpuTimeDataModelUtil(cpu_handler.data)
(...)
30
31 # print(time_per_thread)

File ~/tracetools_analysis/tracetools_analysis/tracetools_analysis/processor/init.py:417, in Processor.process(self, events, erase_progress, no_required_events_check)
409 """
410 Process all events.
411
(...)
414 :param no_required_events_check: whether to skip the check for required events
415 """
416 if not no_required_events_check:
--> 417 self._check_required_events(events)
419 if not self._processing_done:
420 # Split into two versions so that performance is optimal
421 if self._progress_display is None:

File ~/tracetools_analysis/tracetools_analysis/tracetools_analysis/processor/init.py:399, in Processor._check_required_events(self, events)
397 missing_events[handler.class.name].add(name)
398 if missing_events:
--> 399 raise self.RequiredEventNotFoundError(
400 f'missing events: {dict(missing_events)}'
401 )

RequiredEventNotFoundError: missing events: {'CpuTimeHandler': {'sched_switch'}}

I am on branch humble and generated the trace with the example launch file in ros2_tracing (converted file attached as txt due to github limitations:

converted.txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.