Code Monkey home page Code Monkey logo

Comments (7)

GoesM avatar GoesM commented on May 26, 2024

Additional information

To make sure whether there's any possibility that a subscriber's ongoing work would completely shutdown and exit after **_sub_.reset(), nowadays, I do something following:

Part 1. for experiments

To make sure if subscriber's callback-function like handlerInitialPose() would keep going after initial_pose_sub_.reset(),

code insert

I insert some code for log as following:

1> in function handleInitialPose(), I insert a sleep function to extend function execution period, and log the working state.

void
AmclNode::handleInitialPose(geometry_msgs::msg::PoseWithCovarianceStamped & msg)
{
    //log insert
      RCLCPP_INFO(get_logger(),"[debug]:in handleInitialPose()");
      std::this_thread::sleep_for(std::chrono::seconds(1));// sleep for 1s
      RCLCPP_INFO(get_logger(),"[debug]:still in handlerInitialPose()");
    //insert end
...
...

2> In function on_cleanup(), I insert log to show the reset() has been done.

nav2_util::CallbackReturn
AmclNode::on_cleanup(const rclcpp_lifecycle::State & /*state*/)
{
  RCLCPP_INFO(get_logger(), "Cleaning up");
...
...
  initial_pose_sub_.reset();
    //log insert
      RCLCPP_INFO(get_logger(),"[debug]:initial_pose_sub_.reset() has been done");
    //insert end
...

Thus, if there's a log "still in handlerInitialPose()" occurs after initial_pose_sub_.reset() has been done, it could prove that **sub_.reset() could not make sure all of the subscriber's ongoing work finished.

how I use nav2 within inserted code.

  • step 1. launch navigation2 normally
  • step 2. sending msg onto topic /initialpose within 20HZ
  • step 3. Ctrl+C to navigation2 randomly

experimental result

try for many times, I find some log of amcl as following:

[INFO] [1709868236.040625871] [amcl]: initialPoseReceived
[INFO] [1709868236.040660754] [amcl]: [debug]:in handleInitialPose()
[INFO] [1709868236.221511090] [rclcpp]: signal_handler(signum=2)
[INFO] [1709868236.221737977] [amcl]: Running Nav2 LifecycleNode rcl preshutdown (amcl)
[INFO] [1709868236.221855760] [amcl]: Deactivating
[INFO] [1709868236.221876835] [amcl]: Destroying bond (amcl) to lifecycle manager.
[INFO] [1709868236.323455736] [amcl]: Cleaning up
[INFO] [1709868236.324634485] [amcl]: [debug]:initial_pose_sub_.reset() has been done
[INFO] [1709868237.040899303] [amcl]: [debug]:still in handlerInitialPose()
[WARN] [1709868237.041050883] [amcl]: Failed to transform initial pose in time (Lookup would require extrapolation into the future.  Requested time 1709868122.134358 but the latest data is at time 7.821000, when looking up transform from frame [odom] to frame [base_footprint])
[INFO] [1709868237.041177392] [amcl]: Setting pose (7.791000): -2.094 -0.466 0.081
[INFO] [1709868237.044379407] [amcl]: Destroying bond (amcl) to lifecycle manager.

it's obvious that here's a log still in handlerInitialPose() after initial_pose_sub_.reset() has been done,

which could prove that reset the sub_ pointer could not completely shutdown subscriber's ongoing work , at least, no block here.

Part 2. for logic of the code

reset() seems like a function provided by the class shared_ptr, which is just used for free(ptr) and ptr=null,

while it has no design to make sure that all of the subscriber's ongoing work being finished after reset()

No, a reset will need to wait for a callback complete, but it should block until then - I believe - such that those later lines still cannot be executed

I guess you mean reset() will execute the destruct function of the class subscription, in which it would shutdown callback ? would it wait/block until the callback finished ? if only shutdown and no wait/block here, it's still possible to keep ongoing work after reset(). ( there may be no block here, I think, because of the lack of my reference about how the destruct function is designed )

from navigation2.

SteveMacenski avatar SteveMacenski commented on May 26, 2024

I thought the subscription class in the destructor would block until complete, but it does not appear that it does after closer inspection of the source code and your logs. I think filing a ticket with rclcpp that the subscriber's destructor does not wait until currently processing callbacks are exited is a good thing to note.

I am, however and separately now that I'm thinking about this more, a little unclear how it is that the subscription callback is possibly still executing if we're executing on the on_cleanup callback. I don't think we construct any new Node or Executor in AMCL or in the nav2_util/LifecycleNode or rclcpp/LifecycleNode, so they should be sharing a single threaded callback group. It shouldn't be possible for those to both be executing at the same time unless we introduce a new thread somewhere that's processing the subscription vs the lifecycle transitions. @clalancette you have more familiarity with the internals of rclcpp, do you know if the lifecycle transitions are on another callback group or executor from the default provided by the node? Or is that almost certainly something in Nav2 that we're doing?

from navigation2.

GoesM avatar GoesM commented on May 26, 2024

I think filing a ticket with rclcpp that the subscriber's destructor does not wait until currently processing callbacks are exited is a good thing to note.

I've opened ISSUE filing this ticket with rclcpp : ros2/rclcpp#2447.

By the way , I have had a try to understand how the class subscription designed, however it's really strange that I didn't find any destructor in subscription.cpp of rclcpp stack.... ( Perhaps because I'm too unfamiliar with rclcpp?..... : (

from navigation2.

mjcarroll avatar mjcarroll commented on May 26, 2024

I took a look based on the ros2/rclcpp#2447 issue, and I don't think this is related to destructors.

Rather, it looks to me that the AmclNode::on_cleanup is cleaning up a particle filter while the subscription for initial_pose is still receiving messages and firing callbacks.

Based on the ASAN trace there, the on_cleanup callback was fired on one thread while the subscription callback was fired on another thread.

It appears that there is a second single threaded executor being spawned in the on_configure method here: https://github.com/ros-planning/navigation2/blob/47374622dee01a27e5f9b8ae08f3d19a15de9b3a/nav2_amcl/src/amcl_node.cpp#L249-L251

from navigation2.

GoesM avatar GoesM commented on May 26, 2024

Based on the ASAN trace there, the on_cleanup callback was fired on one thread while the subscription callback was fired on another thread.

yep, I believe so as well.

So I think it would be a solution if we could let the subscription callback thread joint into the on_cleanup thread after sub_.reset() ? so that we could make sure the subscription exit at all.

from navigation2.

mjcarroll avatar mjcarroll commented on May 26, 2024

Yes, there are several moving pieces here, but I think that once you have reset all of the subscriptions, you should wait to make sure that the executor thread has actually joined before proceeding with the rest of the cleanup.

from navigation2.

SteveMacenski avatar SteveMacenski commented on May 26, 2024

That is my bad then! I missed that second executor, AMCL isn't something I've spent alot of time looking at since the original port back in 2018. That's something I should have caught before looking you guys in, sorry about that.

If we executor_thread_.reset() that'll cancel and join the thread. Do that before the subscription reset and we should be good to go.

from navigation2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.