Comments (3)
cc @wbpcode @alyssawilk @mattklein123
from envoy.
I think this could and ideally would be implemented as an envoy extension.
There's a policy regarding extension addition here: https://github.com/envoyproxy/envoy/blob/main/EXTENSION_POLICY.md
If your work doesn't meet the guidelines, you could add this to contrib.
from envoy.
Thanks for your reply @alyssawilk ! I have described the current design in some detail as we have some crucial changes in envoy core in order to make this work.
Detailed Steps
Reverse Connection Initiation and Acceptance
- Reverse connection initiation is triggered by the addition of a listener (let's call it "rc_listener") with extra metadata fields. The said metadata contain a list of remote clusters to which reverse connections are required and the number of reverse connections required for each, like so:
metadata:
filter_metadata:
envoy.reverse_conn:
source_node_id: "initiator_node"
clusters:
cluster_name: "cluster 1"
reverse_connection_count: 5
cluster_name: “cluster 2”
reverse_connection_count: 10
This metadata indicates that instead of binding to a port and listening (bind_to_port is set to false), rc_listener has to invoke the reverse connection workflow. In (TcpListenerImpl,) we check whether the above metadata is present, and if so, we set bind_to_port_ to false, collect the of cluster -> reverse connection count information into a "remote_cluster_to_conns" hashmap, and register a request for reverse connection creation.
The next few steps are performed by three new entitities added within DispatcherImpl:
ReverseConnectionInitiator (RCInitiator)
Thread local entity within Dispatcher, that is created unique for each Listener Tag. On being created, the RCInitiator initiates "reverse_connection_count" connections to each "cluster_name" in rc_listener's metadata. Upon connection closure, it is invoked to re-initiate connections.
ReverseConnectionManager(RCManager)
A single thread local resource that manages the lifecycle of several ReverseConnectionInitiators. The RCManager maintains a map "available_rc_initiators" of RCInitiator created per listener tag, and a map "connection_to_rc_initiator_map" storing each reverse connection's key to the RCInitiator that created and owns it.
The RCManager provides a couple of important APIs:
- registerRCInitiatornitiators(listener, remote_cluster_to_conns...): Creates a new RCInitiator if not present, for the listener tag of listener, and stores it in "available_rc_initiators" map.
- unregisterRCInitiatornitiator( listener): Finds the RCInitiator that was created for the listener, calls its destructor. This empties the RCInitiator's internal maps and thereby closes the connections that had been initiated by the RCInitiator.
- notifyConnectionClose(connectionKey..): Find the RCInitiator that owns the connection with key connectionKey by looking up in "connection_to_rc_initiator_map", and invoke it to close the connection.
ReverseConnectionHandler(RCHandler)
A thread local socket manager that functions only on the responder envoy side. It stores a map "accepted_reverse_connections" of initiator_node -> list of ConnectionSocketPtr; each accepted reverse connection.
-
The registerRCInitiatornitiators API is called by TcpListenerImpl upon discovery of reverse connection metadata, thus creating a RCInitiator. The created RCInitiator is stored in the "available_rc_initiators" map.
-
The RCI, upon initiator, runs a periodic function maintainConnCount(), that checks iterates through the passed remote_cluster_to_conns map and initiates "reverse_connection_count" connections to each "cluster_name". For each cluster, the RCI obtains a thread local cluster entry by calling the cluster manager's getThreadLocalCluster() and then obtains an existing tcp Connection to that cluster. The ClientConnectionPtr is extracted and a ReverseConnectionHandshake HTTP POST request is written to it. This handshake contains information about the initiator envoy (node_id,cluster_id etc) and a protobuf is defined for the format. The connectionKey of this connection is defined as the local socket address (IP:port pair) and is obtained from the ClientConnectionPtr's ConnectionSocket. The RCI adds a read filter to the ClientConnection so that responses from the responder envoy can be intercepted. It also maintains an internal map of cluster -> connection count to re-initiate in case of closure.
-
Each envoy has a listener called "Transport Service Listener" that accepts reverse connections and serves as an endpoint for reverse_connection related queries, for eg., obtaining reverse connection stats, etc etc. We have added a new "reverse_conn" filter that does these operations.
-
The reverse_conn filter intercepts HTTP requests, and if a handshake is received, extracts the source information and verifies the certificate is present (verified if the SANS matches the cluster_id, etc etc). The source node_ID is a mandatory field in the reverse connection handshake and if that is not present, the handshake is rejected. A reverse connection handshake return HTTP message is sent to the initiator.
-
If accepted, the reverse_conn filter extracts the raw downstream Connection from the Stream Filter Callback and caches the Raw Connection Socket. It resets file events on the socket's IOHandle, and calls the thread-local Dispatcher's RCHandler.
-
The RCHandler adds the node_id -> ConnectionSocketPtr mapping to the accepted_reverse_connections map, and then does a couple of things:
- It triggers a periodic function to send RPING keepalives on all accepted connection sockets.
- It obtains the underlying File descriptor from the connection socket and adds creates a File Event to respond to RPING replies from the initiator envoy upon file read. If a ping response is not received within a user defined timeout, the socket is marked dead.
-
On the initiator envoy's side, the RCI's read filter intercepts the reverse connection handshake return message and checks whether it was accepted. If not, it closes the ClientConnection. If not, it resets file events on the connection socket, and then set a new boolean flag:connection_reused to true for the connection. This is so that a connection closure is skipped for a reverse connection. The RCInitiator -> connection info is added to the RCManager's connection_to_rc_initiator_map, after which the connection socket is passed to the initiating listener. (rc_listener in this example).
-
On the initiator end, rc_listener has an attached filter called "reverse_connection" filter. The sole purpose of this filter is to wait for the RPING keepalives described in step 7, and respond to them. From the time a socket is accepted by this reverse_connection filter, if RPING keepalives are not received within a user defined timeout, the socket is marked dead.
Reverse Connection Re-Initiation in case of closure
-
Upon connection closure, the RCManager is notified.
-
The RCManager notifies the owning RCInitiator by looking up in connection_to_rc_initiator_map
-
The owning RCInitiator updates the closure in its internal cluster -> connection map. The next iteration of maintainConnCount() initiates one more connection to the remote cluster.
Serving requests from upstream -> downstream envoy by using reverse connections
For requests to work from upstream envoy to downstream through the cached sockets, clusters used by the upstream(responder) envoy to forward requests can not figure out the list of Endpoints by
traditional means. This is because the list is neither static nor a DNS call away. Instead, it will have to rely on the current list of reverse connections accepted by that Envoy. To resolve this, we have introduced a new cluster type called "reverse_connection" (and load balancer type) that allows upstream envoy to dynamically pick a reverse connection socket based on the downstream request context. The upstream envoy config, therefore, should have rules to route traffic to downstream services (which should go via a reverse connection) to a cluster of type "reverse_connection".
-
The upstream envoy expects such requests to come with the "x-dst-node-uuid" set. The value of the "x-dst-node-uuid" is the downstream node which exposes the service.
-
The reverse_connection cluster maintains a map of node_id -> Host. Upon receipt of a request, a HostImpl is created for the node_id and node_id is set as the "host_id" for that host. Subsequent requests re-use the host.
-
The addition of the host_id ensures that a reverse_connection is used to send requests to that host. When the Host calls createConnectionData, we check if the host_id is present, and if so, we invoke the Dispatcher to create a ReversedClientConnectionImpl. The ReversedClientConnectionImpl extends ClientConnectionImpl and instead of creating a client socket from the remote address, takes in the client and transport sockets directly. The client socket is obtained from the accepted_reverse_connections map by quering the RCHandler. In ReversedClientConnectionImpl we override the connect() method to do nothing since we are already connected on the socket. Therefore, the request is sent over a reverse connection. The reverse_connection cluster also does periodic cleanup of stale hosts.
The process is illustrated in the diagram above. This involves a couple of crucial changes in envoy's core dispatcher, during rc initiation, and also in step 14-15 to ensure that a reverse connection is picked by the Dispatcher, thus requiring envoy core changes. Do feel free to share any suggestions/clarifications on our current design for the process of sharing them upstream!
from envoy.
Related Issues (20)
- envoy_reloadable_features_stop_decode_metadata_on_local_reply deprecation
- envoy_reloadable_features_token_passed_entirely deprecation HOT 2
- envoy_reloadable_features_use_cluster_cache_for_alt_protocols_filter deprecation
- envoy_restart_features_send_goaway_for_premature_rst_streams deprecation
- Envoy Logs are on in json format and Logviewer identified them as error HOT 1
- IP Filtering with source: remote_ip and x-forwarded-for IP address does not work HOT 6
- gRPC HTTP/1.1 bridge could map gRPC codes to HTTP response codes HOT 4
- HTTP Connection Manager: Add the ability to have per worker stats HOT 2
- Newer release available `com_github_bazelbuild_buildtools`: v7.1.1 (current: v7.1.0)
- Newer release available `com_github_grpc_grpc`: v1.62.2 (current: v1.59.4) HOT 1
- Newer release available `com_google_protobuf`: v27.0-rc1 (current: v24.4) HOT 1
- Envoy 1.30.0 docker image is not working. It is crashing while creating container HOT 6
- Why accept x-forwarded-* headers when Envoy as edge proxy ? HOT 6
- QuicHttpIntegrationTests/QuicHttpIntegrationTest.Http3ClientKeepalive is flaky
- IpVersions/TcpListenerImplTest.EachQueuedConnectionShouldQueryTheLoadShedPoint is flaky
- tls: add histogram measuring ssl negotiation latency HOT 3
- connections are not uniformly spread across the workers HOT 4
- Question: How can we defer bootstrap extension to be run after envoy server completely starts? HOT 5
- Newer release available `com_google_protobuf`: v26.1 (current: v24.4) HOT 1
- Question/documentation: GRPC-JSON transcoder and grpc ext_authz filter (envoy control plane ext_authz ) integration HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from envoy.