Comments (7)
Additional information
To make sure whether there's any possibility that a subscriber's ongoing work would completely shutdown and exit after **_sub_.reset()
, nowadays, I do something following:
Part 1. for experiments
To make sure if subscriber's callback-function like handlerInitialPose()
would keep going after initial_pose_sub_.reset()
,
code insert
I insert some code for log as following:
1> in function handleInitialPose()
, I insert a sleep function to extend function execution period, and log the working state.
void
AmclNode::handleInitialPose(geometry_msgs::msg::PoseWithCovarianceStamped & msg)
{
//log insert
RCLCPP_INFO(get_logger(),"[debug]:in handleInitialPose()");
std::this_thread::sleep_for(std::chrono::seconds(1));// sleep for 1s
RCLCPP_INFO(get_logger(),"[debug]:still in handlerInitialPose()");
//insert end
...
...
2> In function on_cleanup()
, I insert log to show the reset()
has been done.
nav2_util::CallbackReturn
AmclNode::on_cleanup(const rclcpp_lifecycle::State & /*state*/)
{
RCLCPP_INFO(get_logger(), "Cleaning up");
...
...
initial_pose_sub_.reset();
//log insert
RCLCPP_INFO(get_logger(),"[debug]:initial_pose_sub_.reset() has been done");
//insert end
...
Thus, if there's a log "still in handlerInitialPose()" occurs after initial_pose_sub_.reset() has been done
, it could prove that **sub_.reset()
could not make sure all of the subscriber's ongoing work finished.
how I use nav2
within inserted code.
- step 1. launch navigation2 normally
- step 2. sending msg onto topic
/initialpose
within 20HZ - step 3. Ctrl+C to navigation2 randomly
experimental result
try for many times, I find some log of amcl
as following:
[INFO] [1709868236.040625871] [amcl]: initialPoseReceived
[INFO] [1709868236.040660754] [amcl]: [debug]:in handleInitialPose()
[INFO] [1709868236.221511090] [rclcpp]: signal_handler(signum=2)
[INFO] [1709868236.221737977] [amcl]: Running Nav2 LifecycleNode rcl preshutdown (amcl)
[INFO] [1709868236.221855760] [amcl]: Deactivating
[INFO] [1709868236.221876835] [amcl]: Destroying bond (amcl) to lifecycle manager.
[INFO] [1709868236.323455736] [amcl]: Cleaning up
[INFO] [1709868236.324634485] [amcl]: [debug]:initial_pose_sub_.reset() has been done
[INFO] [1709868237.040899303] [amcl]: [debug]:still in handlerInitialPose()
[WARN] [1709868237.041050883] [amcl]: Failed to transform initial pose in time (Lookup would require extrapolation into the future. Requested time 1709868122.134358 but the latest data is at time 7.821000, when looking up transform from frame [odom] to frame [base_footprint])
[INFO] [1709868237.041177392] [amcl]: Setting pose (7.791000): -2.094 -0.466 0.081
[INFO] [1709868237.044379407] [amcl]: Destroying bond (amcl) to lifecycle manager.
it's obvious that here's a log still in handlerInitialPose()
after initial_pose_sub_.reset() has been done
,
which could prove that reset the sub_ pointer could not completely shutdown subscriber's ongoing work , at least, no block here.
Part 2. for logic of the code
reset()
seems like a function provided by the class shared_ptr
, which is just used for free(ptr)
and ptr=null
,
while it has no design to make sure that all of the subscriber's ongoing work being finished after reset()
No, a reset will need to wait for a callback complete, but it should block until then - I believe - such that those later lines still cannot be executed
I guess you mean reset()
will execute the destruct function of the class subscription
, in which it would shutdown callback ? would it wait/block until the callback finished ? if only shutdown and no wait/block here, it's still possible to keep ongoing work after reset()
. ( there may be no block here, I think, because of the lack of my reference about how the destruct function is designed )
from navigation2.
I thought the subscription class in the destructor would block until complete, but it does not appear that it does after closer inspection of the source code and your logs. I think filing a ticket with rclcpp that the subscriber's destructor does not wait until currently processing callbacks are exited is a good thing to note.
I am, however and separately now that I'm thinking about this more, a little unclear how it is that the subscription callback is possibly still executing if we're executing on the on_cleanup
callback. I don't think we construct any new Node
or Executor
in AMCL or in the nav2_util/LifecycleNode
or rclcpp/LifecycleNode
, so they should be sharing a single threaded callback group. It shouldn't be possible for those to both be executing at the same time unless we introduce a new thread somewhere that's processing the subscription vs the lifecycle transitions. @clalancette you have more familiarity with the internals of rclcpp
, do you know if the lifecycle transitions are on another callback group or executor from the default provided by the node? Or is that almost certainly something in Nav2 that we're doing?
from navigation2.
I think filing a ticket with rclcpp that the subscriber's destructor does not wait until currently processing callbacks are exited is a good thing to note.
I've opened ISSUE filing this ticket with rclcpp : ros2/rclcpp#2447.
By the way , I have had a try to understand how the class subscription
designed, however it's really strange that I didn't find any destructor in subscription.cpp
of rclcpp stack.... ( Perhaps because I'm too unfamiliar with rclcpp?..... : (
from navigation2.
I took a look based on the ros2/rclcpp#2447 issue, and I don't think this is related to destructors.
Rather, it looks to me that the AmclNode::on_cleanup
is cleaning up a particle filter while the subscription for initial_pose
is still receiving messages and firing callbacks.
Based on the ASAN trace there, the on_cleanup
callback was fired on one thread while the subscription callback was fired on another thread.
It appears that there is a second single threaded executor being spawned in the on_configure method here: https://github.com/ros-planning/navigation2/blob/47374622dee01a27e5f9b8ae08f3d19a15de9b3a/nav2_amcl/src/amcl_node.cpp#L249-L251
from navigation2.
Based on the ASAN trace there, the
on_cleanup
callback was fired on one thread while the subscription callback was fired on another thread.
yep, I believe so as well.
So I think it would be a solution if we could let the subscription callback thread joint into the on_cleanup thread after sub_.reset()
? so that we could make sure the subscription exit at all.
from navigation2.
Yes, there are several moving pieces here, but I think that once you have reset
all of the subscriptions, you should wait to make sure that the executor thread has actually joined before proceeding with the rest of the cleanup.
from navigation2.
That is my bad then! I missed that second executor, AMCL isn't something I've spent alot of time looking at since the original port back in 2018. That's something I should have caught before looking you guys in, sorry about that.
If we executor_thread_.reset()
that'll cancel and join the thread. Do that before the subscription reset and we should be good to go.
from navigation2.
Related Issues (20)
- heap-buffer-overflow bug caused by user misconfiguration (amcl:min_particles=a negative value) HOT 2
- Removing AMCL from nav2_bringup launch HOT 2
- Error codes in NavigateToPose/NavigateThroughPoses
- Hi, I am building the package of nav2_waypoint_follower and facing that errors, HOT 2
- New "Configure" transition for the LifecycleManager HOT 1
- Accessing waypoints inside a controller plugin HOT 3
- Fix flaky spin test
- Nav2 Stalling on Multiple Robots? HOT 6
- Robot using nav2 begins pathfollowing but never finishes HOT 3
- MPPI cannot follow global path accurately HOT 1
- [Collision_monitor] Approach polygon time=0 step is not processed HOT 7
- The lidar point cloud of NAV2 shifted significantly after being stationary for 5 minutes. HOT 5
- Errors with controller_server using GPS for Navigation HOT 4
- Obstacle Position Shift in Map after Loading HOT 2
- Full footprint collsion distance in MPPI obstacle critic and use of collision_margin_distance. HOT 12
- local_costmap does not respect the use_sim_time parameter HOT 12
- Vector polygon - Collision Monitor is not available for ROS2 humble HOT 3
- [collision_monitor] Add temporal axis to min_points behavior HOT 2
- [ERROR] [1716697697.080277840] [rviz2]: Lookup would require extrapolation into the future. Requested time 1716697697.040514 but the latest data is at time 1716697697.039929, when looking up transform from frame [laser_frame] to frame [odom]
- Laser Scan rotates with robot ------- [ERROR] [1716697697.080277840] [rviz2]: Lookup would require extrapolation into the future. Requested time 1716697697.040514 but the latest data is at time 1716697697.039929, when looking up transform from frame [laser_frame] to frame [odom]
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from navigation2.