Comments (5)
We saw a double-leader situation recently when a ZK server cycled, and we suspect it has something to do with https://issues.apache.org/jira/browse/CURATOR-696. That Curator Jira suggests a bug was introduced by https://issues.apache.org/jira/browse/CURATOR-644 (PR: apache/curator#430).
It seems possible that this did introduce a bug, since that changed the logic from doing reset()
always on reconnection (which would recreate the ephemeral znode) to doing getChildren()
, which would look for existing ones, and then only call reset()
if they could not be found.
We updated to Curator 5.4 some time ago, in #13302. So if this is indeed what’s going on, it has potentially been an issue since Druid 25.
What we saw specifically was this scenario:
-
OL 1 was leader prior to ZK connection loss
-
OL 1 reconnected to ZK and got a session id that we believe is a new session id (although we were not able to confirm that)
-
OL 1's LeaderLatch recipe checked the latch patch and saw an ephemeral znode there that it believed was its own, so it started leadership.
-
OL 2, 30s later, checked the latch path and saw no children at all (not even the one for OL 1). It created an ephemeral znode for itself, and started leadership.
We think what happened is that both OLs established new sessions, even though the old sessions hadn’t expired yet. Because the old sessions hadn’t expired yet, the old ephemeral znodes were still there upon reconnection. The old leader, OL 1, saw both old znodes there and assumed it was still leader. But because those znodes were associated with different sessions, they went away in 30s. When OL 2 noticed that, it assumed there was no active leader, so it became one and then we had two leaders.
from druid.
I commented on CURATOR-696 linking back here.
from druid.
@cryptoe can we re-open this issue since #16425 was reverted in #16445?
from druid.
Another observation is that this condition occurred during a ZK leader election change.
from druid.
@gianm Curator 5.7.0 includes the fix for https://issues.apache.org/jira/browse/CURATOR-696. I'm unsure when this version will be made available, but have asked here.
from druid.
Related Issues (20)
- [DRAFT] 30.0.0 release notes HOT 7
- Coordinator cannot read task logs from Peon
- Query planner is failing to optimize intervals in some cases
- Issue with PostAggregator arrayOfDoublesSketchConstant in latest Druid 29.0.1 HOT 5
- Problem with array, UNNEST and JSON_PARSE HOT 2
- Producing a version of druid which is FIPS compliant
- Result level cache key collisions from utf8 encoding
- Incorrect comparator usage in FinalizingPostAggregators
- Query with JSON_QUERY/JSON_VALUE using parameters across a join fails to be planned
- MemcachedCache#get doesn't ensure it returns the results for the key its asked
- ORC file sampler is not able to identify the date data type
- Assistance testing Druid JDBC driver?
- Druid pac4j extension skipOnFailure doesn't work HOT 1
- Kafka indexing service duplicate key exception in druid_pendingSegments table HOT 7
- Druid Console cannot open submit supervisor dialog
- Unnecessary Option variable null check
- Druid homepage recent releases date is incorrect HOT 1
- historical process open too many segments files(all segments files)
- Failed to return status for realtime ingestion tasks
- Install on Apple M2 fails HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from druid.