Comments (7)
@brian-pane 👋 Thanks for writing this issue! Yes, your understanding here is exactly correct. We discussed a but of the design around this in our blog post announcing the director - we've historically found that multiple machine failure was rare enough that having 2 servers per hash entry was enough (though obviously more would be better). Indeed, we always send them to the "most healthy" of the 2 servers in the hope that one succeeds, but in the case where both are down the primary is selected and that may not work.
We initially did all this with a pattern that was less extensible and didn't support more than 2 servers, but by the open source release it got to a point where everything except the forwarding table supports N servers rather than just 2. The only limitation to N is really the size allowable for the encapsulation GUE packet - on our networks we run with jumboframes but internet traffic is <1500, so there's plenty of room for more. Ideally in this case, the list of servers would be ordered, then all unhealthy servers would be moved to the end of the list, with the highest priority healthy server getting first dibs, followed by any other healthy servers, finally followed by unhealthy servers in order (in the hope they might actually still be working and complete connections).
We've also talked about adding failure domain tags to servers so we can avoid 2 machines in the same row sharing a failure domain (for example, same rack). This makes the probability less likely, but doesn't change the N. This adds some complexity around how fill/drain works and makes the weights per server a little less random, though, so we haven't implemented a fully complete version yet.
I'd be 💯 to patches supporting changing the size of the rows in the forwarding table, as long as it's configurable and there's generally a migration path (maybe versioning the forwarding table file or similar).
from glb-director.
@theojulienne wondering if it would be simpler if a row had a status and in the case of an entire row being dead, bump the connection to the next available row (defining 'next row' as a modulo-max-rows operation so that it is circular)?
I mean have a static rule of originally-indexed-row + 1
when a row has failed?
I see that reviving the dead nodes/dead row will be trickier with the above though.
from glb-director.
I see that reviving the dead nodes/dead row will be trickier with the above though.
Yea, that's the problem - if you send it to a wider range of nodes without always falling back to them or having a quiesce period (which adds more complexity), you end up causing all connections to disconnect when machines become healthy again. That's why increasing the table size to N entries and always sending packets with those N alternatives (ordered based on chance of success) is probably the simplest.
from glb-director.
@theojulienne It would still be an availability question. It is up for debate - whether to solve this problem at the cost of extra complexity or return an error (5XX) to the user. Depends on how often we face this. Also required is an evaluation of the extra cost .
The following is a possibility if we are already able to determine when we are starting new connections AND when we are terminating a connection at the director:
Let's say we had this:
struct glb_fwd_config_content_table_entry {
struct {
// static
uint32_t primary;
uint32_t secondary;
};
struct {
// dynamic
uint32_t state; // valid states: AVAILABLE, DEGRADED, UNAVAILABLE, READY
uint32_t connection_count;
};
} __attribute__((__packed__));
Then,
struct glb_fwd_config_content_table_entry *table;
row_index = hash(s_ip, s_port, d_ip, d_port) & GLB_FMT_TABLE_HASHMASK;
if (table[row_index].state == UNAVAILABLE) {
row_index = (row_index + 1) % table_size;
}
/* On new connection */
table[row_index].connection_count++; // atomic
/* On connection termination */
table[row_index].connection_count--; // atomic
if (table[row_index].connection_count == 0) {
// Good time to promote adjacent row from READY to AVAILABLE
// NOTE: Since the director is threaded, there is a little more here, but not delving yet.
// We could dive deep if you like the idea.
promote_index = (row_index > 0 ? row_index - 1 : table_size - 1)
if (table[promote_index].state == READY) {
table[promote_index].state = AVAILABLE;
}
}
The movement from UNAVAILABLE to READY will likely be handled by a thread within the director interfacing with healthcheck.
We can discuss the state machine if you think this is interesting. I am hoping this does not need locking and can be managed through atomics. There is still a window of failure when state changes happen before updates to the corresponding state
in the table (which is defined by the polling interval of the healthcheck thread).
from glb-director.
The following is a possibility if we are already able to determine when we are starting new connections AND when we are terminating a connection at the director:
The glb-director is stateless, it doesn't know when connections are created or destroyed and doesn't maintain any TCP connection state, which is part of the design for allowing BGP route changes to direct connections to different glb-director nodes with zero impact. This is discussed a bit in why not LVS and how no state is stored on director.
I think adding extra servers to the forwarding table rows is a significantly simpler change, in that it only adds extra potential hops and doesn't change any of the design constraints of the system, and it buys all the benefits you'd potentially get from un-isolating rows (essentially, adding more potential servers to work around failures).
from glb-director.
Hi Theo
The statelessness is a pretty strong design aspect that ought to be preserved. So, totally agree.
I think adding extra servers to the forwarding table rows is a significantly simpler change, in that it >only adds extra potential hops and doesn't change any of
I’ve already got this done & tested manually. In the process of updating unit-tests. Following that I’ll create a PR for this.
Regards
Ravi
from glb-director.
The following is a possibility if we are already able to determine when we are starting new connections AND when we are terminating a connection at the director:
The glb-director is stateless, it doesn't know when connections are created or destroyed and
The rest is moot :)
My bad. I was excited, then blinded by what seemed like a simple solution to me to use the next row in the forwarding table. I wrongly assumed that we know at the director when new connections are being made (with no further state preservation) so that they can be directed at the secondary (which becomes the new primary) when primary is drained and missed that the redirection is managed via iptables forwarding at the proxies themselves.
from glb-director.
Related Issues (20)
- glb-director-xdp on bonded nic HOT 2
- stopping the glb-director-xdp service does not stop directing traffic
- Metrics of glb-director-xdp HOT 1
- cibuild-create-packages fails to prepare the Docker build environment with a broken packages error HOT 1
- conntrack lookup removal in ipt_GLBREDIRECT breaks with network namespaces HOT 1
- Tag releases please HOT 1
- glb-director failing host + ecmp/ibgp HOT 4
- glb-director fails to build with kernel linux-5.9.1 HOT 1
- Question: XDP Director status HOT 3
- Question about filling state HOT 2
- How was this made?
- how to use glb-director-xdp?
- make error
- Destination Mac address mapping according to backend IP address HOT 2
- Package for Debian buster ? HOT 2
- dperf: a high performance open source l4lb load tester
- cannot build with latest DPDK HOT 1
- GUE healthcheck fails with self IP
- Is it possible to run GLB without GUE? HOT 1
- Branch Protections Audit - 2022-10-05T18-41-55-524 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from glb-director.