Comments (5)
from device-rfid-llrp-go.
What you're seeing is technically "correct" behavior, but I agree it's worth modifying the code to account for this case.
In short, the connection management considers anything other than an explicit request to shutdown the connection as a failed attempt. That explicit request only happens in three cases:
- If the device address changes
- If the device is removed from the device service (in EdgeX, that is)
- If the device service shuts down
The current code has a very broad definition of failure. Even though your devices eventually reestablish a healthy connection, they are still disconnecting in an "unhealthy" way. As a result, the failure count keeps incrementing, and the backoff delay increases to its maximum.
The reason for this logic is that a successful dial alone does not indicate a healthy state. It's possible we connect to something at the given address and port, but it's not actually an LLRP Reader. Even if it is an LLRP Reader, it still very well may not be a healthy connection. For instance, your logs show several semi-successful connection attempts which ultimately failed because the Reader reported it was already connected to a different client.
You could implement the logic "reset the Slow
retry loop if the connection was reestablished" by adding return true, nil
in the same block that notifies EdgeX when a device moves from enabled -> disabled
. That property is reset if the connection is successfully restored. Technically, you'd be telling the Retry
code that this attempt was successful (when it was really the one immediately before it), but the behavior will be the same: it'll reset the failure count and attempt to reconnect right away. If that fails, enabled
will still be false
, and it'll return true, err
signaling that it should increment the failure count, delay using exponential back off, then reattempt the connection.
Here's a more detailed explanation of the code, along with a few other suggestions for potential improvements, especially if you want to change this logic to handle other cases.
The relevant code starts here and continues to the end of the closure. The code uses three nested loops, each for a different purpose:
- The inner Quick retry loop handles the actual client connection to the RFID Reader. Originally this was the only Retry loop. If that fails or closes unexpectedly, it'll backoff and retry up to a couple of times, after which that loop exits with an error. If it the connection closes normally (e.g. to change addresses), it exits the loop without an error.
- The Slow retry loop simply returns if the inner loop exited normally, including shutdown or device removal. Otherwise, it notifies EdgeX that the device is offline, then performs its backoff/retry.
- The outermost loop only stops when a device is removed or the service shuts down.
As an aside, nesting it like that isn't great, but the locking/state management going on there is a bit tricky, and this makes it harder to break (because most of the state is not accessible elsewhere). Looking back, I think we could clean this up by simply assigning the functions explicitly, e.g.
connectToReader := func(ctx context.Context) (bool, error) {
// lines 128-174 here...
}
notifyIfDisabled := func(ctx context.Context) (bool, error) {
err := retry.Quick.RetryWithCtx(ctx, maxConnAttempts, connectToReader)
// lines 177-203
}
// Until the service shuts down or the device is removed,
// attempt to maintain a connection to the reader;
// when not possible, notify EdgeX that the device is Disconnected.
for ctx.Err() == nil {
_ = retry.Slow.RetryWithCtx(ctx, retry.Forever, notifyIfDisabled)
}
Hopefully this helps explain the issue and potential remedies.
from device-rfid-llrp-go.
@ajcasagrande , is this a critical issue that should be resolved for Jakarta release?
from device-rfid-llrp-go.
@lenny-intel not a critical issue. It really only affects the simulator (when disconnecting and reconnecting a lot), and devices which are constantly disconnecting. In the case of the latter, the re-connection timeout is probably the least of your concerns (why is it disconnecting in the first place? network issues? power failures? etc.)
I wouldn't necessarily close the issue, but its probably of low priority.
from device-rfid-llrp-go.
@ajcasagrande , Thanks!
from device-rfid-llrp-go.
Related Issues (20)
- Remove Service Wrapper now that Device SDK has same interface HOT 2
- Remove code that loads static provision watchers (functionality now in latest SDK)
- [Device LLRP] Drive code causing events to be sent before SDK is fully initialized which causes panic
- Replace the use of pkg/errors module with errors
- Not reading EPC values with Zebra FX7500 Reader HOT 5
- Making internal/llrp exported HOT 3
- [Device LLRP] Migrate service to V2
- Clean up READMe HOT 6
- Add .md pages describing how to test (with real hardware)
- [LLRP Device] Update snaps for a Jakarta Release
- [LLRP Device] Add "make lint" target and add to "make test" target HOT 1
- Tagging for v2 HOT 3
- Device LLRP: Add missing CORS configuration section
- Security: Implement CGO binary hardening HOT 2
- Add "make lint" target and add to "make test" target HOT 1
- Scripts are not working as mentioned in readme on "example-scripts" segment HOT 2
- Failed DEBUG - message from device-rfid-llrp service while tag reading and DEBUG mode log setting in consul. HOT 5
- 'AddROSpec' - this command not available for llrp device service HOT 6
- Missing curl in snap package
- Version not set properly and SDK version never set
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from device-rfid-llrp-go.