Comments (18)
Thanks for reporting this. Can you post a minimal working example that compiles and link to it here (perhaps via Gist)?
from rclcpp.
Hey, unfortunately I don't have the time to create a minimal example at the moment. I try to give one at the end of the week.
At the moment I did a small fix in rclcpp/client.hpp:
in the function handle_response
if(call_promise)
{
call_promise->set_value(typed_response);
callback(future);
}
With this fix everthing is working without any segfaults.
But I don't think that this should be permanent fix.
from rclcpp.
Hey,
I create a minimum demo on:
https://github.com/firesurfer/Segfault_demo
It consists of two parts: a segfault demo server and a segfault demo client. The segfault occurs in the client (test.cpp)
from rclcpp.
Thanks. I'll take a look at this today.
from rclcpp.
The segfault is generalizable to services/clients, it's not specific to parameters (parameters are implemented on top of services/clients). I rewrote the test with a plain service and client and reproduced the segfault in ros2/system_tests#104.
My potential fix is in #193. It's not much more complicated than what you did, but I'm also not sure if that should be the permanent fix.
from rclcpp.
Based on @gerkey investigation I think we should first merge these two (ros2/rmw_connext#131 ros2/rmw_opensplice#112) once the Windows CI job turns over and is green. They have the potential to fix other issues (like 100% cpu load, etc.) too.
from rclcpp.
After some investigation, with consultation from @jacquelinekay and @dirk-thomas, I believe the following things:
- The proposed fix in #193 addresses the client-side segfault. I haven't seen any client-side problems at all.
- With the client-side problem fixed, this test case (ros2/system_tests#104) now exposes a server-side problem with services.
- The timeout seen by Jenkins is due to the service server getting into a bad state in between the two service calls being made by the client.
- The effect of the bad state is that
wait()
continuously returns the service-associated read condition as something that should be checked, butrmw_take_request()
continuously fails to take any request data; this loop proceeds forever, causingspin_node_some()
to never return. - The bad state is caused by something happening to the read condition associated with the service server when the first service client disconnects. I don't know where the bad state originates; it might be in OpenSplice, or in rmw_opensplice_cpp, or nowhere, in the sense that we should be handling an event in rmw_opensplice_cpp that we're currently not.
- A related issue is that we're doing unnecessary work because we're not nulling service (and other) handles when
wait()
times out; @dirk-thomas addressed that in ros2/rmw_opensplice#112. - All of the above might also apply to other rmw implementations (e.g., ros2/rmw_connext#131).
from rclcpp.
I'm still seeing a segfault on master with the Opensplice version of the test in ros2/system_tests#104, and an occasional deadlock with #193.
from rclcpp.
@jacquelinekay That's what I'd expect: running ros2/system_tests#104 without #193 gives you a segfault (essentially the original bug reported here) and running it with #193 gives you a deadlock (caused by the "bad state" that the service server gets into upon client disconnection). @dirk-thomas's fix in ros2/rmw_opensplice#112 addresses a similar issue but does not get rid of the deadlock.
from rclcpp.
Got it, I thought you were suggesting those PRs would fix the issue, but you're saying that the failure might be due to related issues.
from rclcpp.
Yeah, there's definitely something still lurking in the management of service server read conditions when a client disconnects. We were able to demonstrate the problem last night with a single client that calls the service, then sleeps for several seconds, then exits. The service server looks good at first, responds correctly to the service call, still looks good, then goes bad immediately when the client exits. I'm hoping to look further into it today.
from rclcpp.
While this line listens to any instances:
that line only takes the alive ones:
The result is that once a sample with non-alive instance arrives opensplice wakes up from rmw_wait immediately forever resulting in a busy loop. That loop happens within a single spin_*
call within rclcpp.
from rclcpp.
So changing to ANY_INSTANCE_STATE in the take() call will fix it?
from rclcpp.
Hopefully yes, running CI jobs for it right now...
from rclcpp.
New CI jobs:
- http://ci.ros2.org/job/ci_linux/976/
- http://ci.ros2.org/job/ci_osx/811/
http://ci.ros2.org/job/ci_windows/1063/
from rclcpp.
Creating a second client after the first one was destroyed fails with Connext dynamic on all platforms. Looking into it...
from rclcpp.
New CI jobs (including ros2/rmw_connext#134):
- http://ci.ros2.org/job/ci_linux/984/
- http://ci.ros2.org/job/ci_osx/816/
- http://ci.ros2.org/job/ci_windows/1067/
- the new scoped client tests pass http://ci.ros2.org/job/ci_windows/1067/testReport/(root)/
- same errors as master http://ci.ros2.org/job/ci_windows/1070/
from rclcpp.
From the meeting. Merge this as is with the return after a warning. Add a todo, and after the alpha reescalate it to an exception so we can find it if it happens in the future.
from rclcpp.
Related Issues (20)
- TimersManager doesn't follow ROS time HOT 2
- rclcpp_action: Provide enum class return ClientGoalHandle::get_status
- Callback works on Galactic but fails on Rolling - handle_message is not implemented for GenericSubscription HOT 1
- Clang warning: ordered comparison of function pointers (Rolling) HOT 1
- `-fanalyzer` warning: possible null dereference when using TypeAdapters HOT 4
- leak due to std::shared_ptr circular reference between Context and GuardCondition HOT 3
- :farmer: `rclcpp.test_executors` failing in Rolling and Jammy CycloneDDS HOT 1
- rclcpp::Time(int64_t nanoseconds, ...) should check for negative time
- Regression : Executor::spin_some_impl is active waiting HOT 5
- Parameter service behavior is inconsistent with the documentation of rcl_interfaces HOT 9
- Lifecycle destructor calls shutdown while in shuttingdown intermediate state HOT 45
- Backport PR2063 to Humble for Windows HOT 2
- Executor callbacks are no longer in a predictable order HOT 25
- '/clock' Topic cannot change each loop step time from simulation time HOT 10
- Program exits with code -11 when using async_send_request to set parameters in ROS 2 C++ client HOT 1
- Timer callbacks can be delayed when using simulation time HOT 4
- Possible regression in rcl preshutdown callbacks - context invalid? HOT 10
- Shutdown transition on base lifecycle node dtor may lead to segaults on subclass-registered shutdown callback HOT 6
- `on_shutdown` callback not called when `shutdown` transition is triggered on dtor HOT 2
- ABI/API Compliance Checker in github workflow HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rclcpp.