While doing some testing, <a class="user-mention notranslate" data-hovercard-type="use

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Packets sent to primary when both primary and secondary destinations are unhealthy about glb-director HOT 7 OPEN

brian-pane commented on September 28, 2024

Packets sent to primary when both primary and secondary destinations are unhealthy

from glb-director.

Comments (7)

theojulienne commented on September 28, 2024 1

@brian-pane 👋 Thanks for writing this issue! Yes, your understanding here is exactly correct. We discussed a but of the design around this in our blog post announcing the director - we've historically found that multiple machine failure was rare enough that having 2 servers per hash entry was enough (though obviously more would be better). Indeed, we always send them to the "most healthy" of the 2 servers in the hope that one succeeds, but in the case where both are down the primary is selected and that may not work.

We initially did all this with a pattern that was less extensible and didn't support more than 2 servers, but by the open source release it got to a point where everything except the forwarding table supports N servers rather than just 2. The only limitation to N is really the size allowable for the encapsulation GUE packet - on our networks we run with jumboframes but internet traffic is <1500, so there's plenty of room for more. Ideally in this case, the list of servers would be ordered, then all unhealthy servers would be moved to the end of the list, with the highest priority healthy server getting first dibs, followed by any other healthy servers, finally followed by unhealthy servers in order (in the hope they might actually still be working and complete connections).

We've also talked about adding failure domain tags to servers so we can avoid 2 machines in the same row sharing a failure domain (for example, same rack). This makes the probability less likely, but doesn't change the N. This adds some complexity around how fill/drain works and makes the weights per server a little less random, though, so we haven't implemented a fully complete version yet.

I'd be 💯 to patches supporting changing the size of the rows in the forwarding table, as long as it's configurable and there's generally a migration path (maybe versioning the forwarding table file or similar).

from glb-director.

pavantc commented on September 28, 2024

@theojulienne wondering if it would be simpler if a row had a status and in the case of an entire row being dead, bump the connection to the next available row (defining 'next row' as a modulo-max-rows operation so that it is circular)?
I mean have a static rule of originally-indexed-row + 1 when a row has failed?

I see that reviving the dead nodes/dead row will be trickier with the above though.

from glb-director.

theojulienne commented on September 28, 2024

I see that reviving the dead nodes/dead row will be trickier with the above though.

Yea, that's the problem - if you send it to a wider range of nodes without always falling back to them or having a quiesce period (which adds more complexity), you end up causing all connections to disconnect when machines become healthy again. That's why increasing the table size to N entries and always sending packets with those N alternatives (ordered based on chance of success) is probably the simplest.

from glb-director.

pavantc commented on September 28, 2024

@theojulienne It would still be an availability question. It is up for debate - whether to solve this problem at the cost of extra complexity or return an error (5XX) to the user. Depends on how often we face this. Also required is an evaluation of the extra cost .

The following is a possibility if we are already able to determine when we are starting new connections AND when we are terminating a connection at the director:

Let's say we had this:

struct glb_fwd_config_content_table_entry {
    struct {
        // static
        uint32_t primary;
        uint32_t secondary;
    };
    struct {
        // dynamic
        uint32_t state; // valid states: AVAILABLE, DEGRADED, UNAVAILABLE, READY
        uint32_t connection_count;
    };
} __attribute__((__packed__));

Then,

struct glb_fwd_config_content_table_entry *table;
row_index = hash(s_ip, s_port, d_ip, d_port) & GLB_FMT_TABLE_HASHMASK;
if (table[row_index].state == UNAVAILABLE) {
    row_index = (row_index + 1) % table_size;
}

/* On new connection */
table[row_index].connection_count++; // atomic

/* On connection termination */
table[row_index].connection_count--; // atomic
if (table[row_index].connection_count == 0) {
    // Good time to promote adjacent row from READY to AVAILABLE
    // NOTE: Since the director is threaded, there is a little more here, but not delving yet.
    // We could dive deep if you like the idea.
    promote_index = (row_index > 0 ? row_index - 1 : table_size - 1)
    if (table[promote_index].state == READY) {
        table[promote_index].state = AVAILABLE;
    }
}

The movement from UNAVAILABLE to READY will likely be handled by a thread within the director interfacing with healthcheck.
We can discuss the state machine if you think this is interesting. I am hoping this does not need locking and can be managed through atomics. There is still a window of failure when state changes happen before updates to the corresponding state in the table (which is defined by the polling interval of the healthcheck thread).

from glb-director.

theojulienne commented on September 28, 2024

The following is a possibility if we are already able to determine when we are starting new connections AND when we are terminating a connection at the director:

The glb-director is stateless, it doesn't know when connections are created or destroyed and doesn't maintain any TCP connection state, which is part of the design for allowing BGP route changes to direct connections to different glb-director nodes with zero impact. This is discussed a bit in why not LVS and how no state is stored on director.

I think adding extra servers to the forwarding table rows is a significantly simpler change, in that it only adds extra potential hops and doesn't change any of the design constraints of the system, and it buys all the benefits you'd potentially get from un-isolating rows (essentially, adding more potential servers to work around failures).

from glb-director.

ravisinghsfbay commented on September 28, 2024

Hi Theo
The statelessness is a pretty strong design aspect that ought to be preserved. So, totally agree.

I think adding extra servers to the forwarding table rows is a significantly simpler change, in that it >only adds extra potential hops and doesn't change any of

I’ve already got this done & tested manually. In the process of updating unit-tests. Following that I’ll create a PR for this.

Regards
Ravi

from glb-director.

pavantc commented on September 28, 2024

The following is a possibility if we are already able to determine when we are starting new connections AND when we are terminating a connection at the director:

The glb-director is stateless, it doesn't know when connections are created or destroyed and

The rest is moot :)

My bad. I was excited, then blinded by what seemed like a simple solution to me to use the next row in the forwarding table. I wrongly assumed that we know at the director when new connections are being made (with no further state preservation) so that they can be directed at the secondary (which becomes the new primary) when primary is drained and missed that the redirection is managed via iptables forwarding at the proxies themselves.

from glb-director.

Packets sent to primary when both primary and secondary destinations are unhealthy about glb-director HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent