Code Monkey home page Code Monkey logo

Comments (3)

phlax avatar phlax commented on June 30, 2024

cc @wbpcode @alyssawilk @mattklein123

from envoy.

alyssawilk avatar alyssawilk commented on June 30, 2024

I think this could and ideally would be implemented as an envoy extension.
There's a policy regarding extension addition here: https://github.com/envoyproxy/envoy/blob/main/EXTENSION_POLICY.md
If your work doesn't meet the guidelines, you could add this to contrib.

from envoy.

basundhara-c avatar basundhara-c commented on June 30, 2024

Thanks for your reply @alyssawilk ! I have described the current design in some detail as we have some crucial changes in envoy core in order to make this work.

Detailed Steps

Reverse Connection Components

Reverse Connection Initiation and Acceptance

  1. Reverse connection initiation is triggered by the addition of a listener (let's call it "rc_listener") with extra metadata fields. The said metadata contain a list of remote clusters to which reverse connections are required and the number of reverse connections required for each, like so:
          metadata:
            filter_metadata:
              envoy.reverse_conn:
              source_node_id: "initiator_node"
                clusters:
                    cluster_name: "cluster 1"
                    reverse_connection_count: 5
                    cluster_name: “cluster 2”
                    reverse_connection_count: 10

This metadata indicates that instead of binding to a port and listening (bind_to_port is set to false), rc_listener has to invoke the reverse connection workflow. In (TcpListenerImpl,) we check whether the above metadata is present, and if so, we set bind_to_port_ to false, collect the of cluster -> reverse connection count information into a "remote_cluster_to_conns" hashmap, and register a request for reverse connection creation.

The next few steps are performed by three new entitities added within DispatcherImpl:

ReverseConnectionInitiator (RCInitiator)

Thread local entity within Dispatcher, that is created unique for each Listener Tag. On being created, the RCInitiator initiates "reverse_connection_count" connections to each "cluster_name" in rc_listener's metadata. Upon connection closure, it is invoked to re-initiate connections.

ReverseConnectionManager(RCManager)

A single thread local resource that manages the lifecycle of several ReverseConnectionInitiators. The RCManager maintains a map "available_rc_initiators" of RCInitiator created per listener tag, and a map "connection_to_rc_initiator_map" storing each reverse connection's key to the RCInitiator that created and owns it.
The RCManager provides a couple of important APIs:

  • registerRCInitiatornitiators(listener, remote_cluster_to_conns...): Creates a new RCInitiator if not present, for the listener tag of listener, and stores it in "available_rc_initiators" map.
  • unregisterRCInitiatornitiator( listener): Finds the RCInitiator that was created for the listener, calls its destructor. This empties the RCInitiator's internal maps and thereby closes the connections that had been initiated by the RCInitiator.
  • notifyConnectionClose(connectionKey..): Find the RCInitiator that owns the connection with key connectionKey by looking up in "connection_to_rc_initiator_map", and invoke it to close the connection.

ReverseConnectionHandler(RCHandler)

A thread local socket manager that functions only on the responder envoy side. It stores a map "accepted_reverse_connections" of initiator_node -> list of ConnectionSocketPtr; each accepted reverse connection.

  1. The registerRCInitiatornitiators API is called by TcpListenerImpl upon discovery of reverse connection metadata, thus creating a RCInitiator. The created RCInitiator is stored in the "available_rc_initiators" map.

  2. The RCI, upon initiator, runs a periodic function maintainConnCount(), that checks iterates through the passed remote_cluster_to_conns map and initiates "reverse_connection_count" connections to each "cluster_name". For each cluster, the RCI obtains a thread local cluster entry by calling the cluster manager's getThreadLocalCluster() and then obtains an existing tcp Connection to that cluster. The ClientConnectionPtr is extracted and a ReverseConnectionHandshake HTTP POST request is written to it. This handshake contains information about the initiator envoy (node_id,cluster_id etc) and a protobuf is defined for the format. The connectionKey of this connection is defined as the local socket address (IP:port pair) and is obtained from the ClientConnectionPtr's ConnectionSocket. The RCI adds a read filter to the ClientConnection so that responses from the responder envoy can be intercepted. It also maintains an internal map of cluster -> connection count to re-initiate in case of closure.

  3. Each envoy has a listener called "Transport Service Listener" that accepts reverse connections and serves as an endpoint for reverse_connection related queries, for eg., obtaining reverse connection stats, etc etc. We have added a new "reverse_conn" filter that does these operations.

  4. The reverse_conn filter intercepts HTTP requests, and if a handshake is received, extracts the source information and verifies the certificate is present (verified if the SANS matches the cluster_id, etc etc). The source node_ID is a mandatory field in the reverse connection handshake and if that is not present, the handshake is rejected. A reverse connection handshake return HTTP message is sent to the initiator.

  5. If accepted, the reverse_conn filter extracts the raw downstream Connection from the Stream Filter Callback and caches the Raw Connection Socket. It resets file events on the socket's IOHandle, and calls the thread-local Dispatcher's RCHandler.

  6. The RCHandler adds the node_id -> ConnectionSocketPtr mapping to the accepted_reverse_connections map, and then does a couple of things:

  • It triggers a periodic function to send RPING keepalives on all accepted connection sockets.
  • It obtains the underlying File descriptor from the connection socket and adds creates a File Event to respond to RPING replies from the initiator envoy upon file read. If a ping response is not received within a user defined timeout, the socket is marked dead.
  1. On the initiator envoy's side, the RCI's read filter intercepts the reverse connection handshake return message and checks whether it was accepted. If not, it closes the ClientConnection. If not, it resets file events on the connection socket, and then set a new boolean flag:connection_reused to true for the connection. This is so that a connection closure is skipped for a reverse connection. The RCInitiator -> connection info is added to the RCManager's connection_to_rc_initiator_map, after which the connection socket is passed to the initiating listener. (rc_listener in this example).

  2. On the initiator end, rc_listener has an attached filter called "reverse_connection" filter. The sole purpose of this filter is to wait for the RPING keepalives described in step 7, and respond to them. From the time a socket is accepted by this reverse_connection filter, if RPING keepalives are not received within a user defined timeout, the socket is marked dead.

Reverse Connection Re-Initiation in case of closure

  1. Upon connection closure, the RCManager is notified.

  2. The RCManager notifies the owning RCInitiator by looking up in connection_to_rc_initiator_map

  3. The owning RCInitiator updates the closure in its internal cluster -> connection map. The next iteration of maintainConnCount() initiates one more connection to the remote cluster.

Serving requests from upstream -> downstream envoy by using reverse connections

For requests to work from upstream envoy to downstream through the cached sockets, clusters used by the upstream(responder) envoy to forward requests can not figure out the list of Endpoints by
traditional means. This is because the list is neither static nor a DNS call away. Instead, it will have to rely on the current list of reverse connections accepted by that Envoy. To resolve this, we have introduced a new cluster type called "reverse_connection" (and load balancer type) that allows upstream envoy to dynamically pick a reverse connection socket based on the downstream request context. The upstream envoy config, therefore, should have rules to route traffic to downstream services (which should go via a reverse connection) to a cluster of type "reverse_connection".

  1. The upstream envoy expects such requests to come with the "x-dst-node-uuid" set. The value of the "x-dst-node-uuid" is the downstream node which exposes the service.

  2. The reverse_connection cluster maintains a map of node_id -> Host. Upon receipt of a request, a HostImpl is created for the node_id and node_id is set as the "host_id" for that host. Subsequent requests re-use the host.

  3. The addition of the host_id ensures that a reverse_connection is used to send requests to that host. When the Host calls createConnectionData, we check if the host_id is present, and if so, we invoke the Dispatcher to create a ReversedClientConnectionImpl. The ReversedClientConnectionImpl extends ClientConnectionImpl and instead of creating a client socket from the remote address, takes in the client and transport sockets directly. The client socket is obtained from the accepted_reverse_connections map by quering the RCHandler. In ReversedClientConnectionImpl we override the connect() method to do nothing since we are already connected on the socket. Therefore, the request is sent over a reverse connection. The reverse_connection cluster also does periodic cleanup of stale hosts.

The process is illustrated in the diagram above. This involves a couple of crucial changes in envoy's core dispatcher, during rc initiation, and also in step 14-15 to ensure that a reverse connection is picked by the Dispatcher, thus requiring envoy core changes. Do feel free to share any suggestions/clarifications on our current design for the process of sharing them upstream!

from envoy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.