Comments (21)
cc @ggreenway who may know more about the different TLS protocols that are supported.
from envoy.
@vparla thanks for that. I used the linked python script to test against the latest Envoy and it worked correctly (Envoy sent back a ServerHello).
@cd-fernando can you try the script at https://github.com/dadrian/tldr.fail/blob/main/tldr_fail_test.py and see what it returns?
Can anyone else test this and report success or failure?
from envoy.
Just so I can document this on tldr.fail---where is the socket read buffer configured in this context? Is that an Envoy, TLS Inspector, or kernel setting (or somewhere else)? I would have expected that to be internal to Envoy's implementation, but it didn't seem like this needed a code change?
from envoy.
I would find it shocking if chrome wouldn't be willing to negotiate one of the older TLS 1.3 ciphers in this case; I'd guess that most TLS endpoints on the internet don't yet support the post-quantum cipher suites. Regardless, I think if there's an issue here, it's a bug in chrome.
from envoy.
I would find it shocking if chrome wouldn't be willing to negotiate one of the older TLS 1.3 ciphers in this case; I'd guess that most TLS endpoints on the internet don't yet support the post-quantum cipher suites. Regardless, I think if there's an issue here, it's a bug in chrome.
The problem has made some news website like MSN, I'm not sure what's the policy about posting links so I'll post an excerpt
Despite months of testing, the problem seems to have risen from web servers failing to adequately implement TLS, rather than an issue with Chrome. The error results in the rejection of connections that use the Kyber768 quantum-resistant key agreement algorithm, including connections with Chrome’s hybrid key.
Clearly, this is not a simple fix that can be implemented by Chrome, but it requires a larger and more orchestrated effort to transform the Internet into one that can handle sophisticated quantum-safe cryptography.
For now, affected users are being advised to disable the TLS 1.3 hybridized Kyber support in Chrome. However, long-term post-quantum secure ciphers will be essential in TLS, and the ability to disable the feature will likely be removed in the future, highlighting the importance of addressing the issue’s route cause earlier on so that websites can be prepared for quantum-based attacks in the future.
from envoy.
I might have found some helpful comment on DDG
These errors are not caused by a bug in Google Chrome but instead caused by web servers failing to properly implement Transport Layer Security (TLS) and not being able to handle larger ClientHello messages for post-quantum cryptography.
I think I saw a setting that affects this size in Envoy
from envoy.
I can't find such option, any suggestions?
from envoy.
Can you capture a full tcpdump of the failed handshake and post it?
from envoy.
@ggreenway it doesn't seem to accept pcap files, any suggestions?
from envoy.
Ignore that, here's the gzipped pcap
from envoy.
Huh, it looks like the tcp window is closed after only 1400 bytes. Can you post the full envoy configuration you used for this test?
from envoy.
Huh, it looks like the tcp window is closed after only 1400 bytes. Can you post the full envoy configuration you used for this test?
1400 bytes seems like a typical MTU, maybe we're limiting handshake to a single packet?
from envoy.
Here's a cut-down version of our config, I hope I didn't axe too much
---
admin:
# access_log_path: /tmp/admin_access.log
address:
socket_address: { address: 0.0.0.0, port_value: 9901 }
static_resources:
listeners:
### BEGIN http frontends ###
- name: apis
address:
socket_address: { address: 0.0.0.0, port_value: 443 }
listener_filters:
- name: "envoy.filters.listener.tls_inspector"
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.listener.tls_inspector.v3.TlsInspector
filter_chains:
- filter_chain_match:
server_names: ["*.testdomain.dev"]
transport_protocol: "tls"
transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
common_tls_context:
tls_params:
tls_minimum_protocol_version: TLSv1_2
cipher_suites: "ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305"
tls_certificates:
- certificate_chain:
filename: /etc/envoy/STAR.testdomain.dev.crt
private_key:
filename: /etc/envoy/STAR.testdomain.dev.key
filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
upgrade_configs:
- upgrade_type: connect
codec_type: AUTO
use_remote_address: true
xff_num_trusted_hops: 0
access_log:
- name: envoy.access_loggers.file
typed_config:
"@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
path: "/dev/stdout"
route_config:
name: local_route
virtual_hosts:
- name: system_api
domains: ["api.testdomain.dev"]
routes:
- match: { prefix: "/api/interact/" }
route: { cluster: windfarm }
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
clusters:
- name: windfarm
connect_timeout: 0.25s
type: STATIC
dns_lookup_family: V4_ONLY
lb_policy: ROUND_ROBIN
health_checks:
timeout: 1s
interval: 1s
unhealthy_threshold: 1
healthy_threshold: 2
http_health_check:
path: /ping
load_assignment:
cluster_name: windfarm
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: 127.0.0.1
port_value: 29091
from envoy.
https://tldr.fail/ describes the issue. It is likely that Envoy is not reading the entire Client Hello as it spans packets.
Python test scripts can be found here:
github.com/dadrian/tldr.fail/blob/main/tldr_fail_test.py
from envoy.
I got one more report of it working correctly with Chrome.
I think I understand what's going on: the TlsInspector filter doesn't read from the socket; it peeks. This means the entire ClientHello needs to fit into the configured socket read buffer size.
I saw in the tcpdump that the server had a fully filled up tcp window (of only about 1500 bytes), and I didn't realize until now that it was the TlsInspector, not the TLS transport socket, that was getting stuck.
I've never seen a socket read buffer configured that small. What OS are you using?
If you don't need to select a filter chain based on SNI, you can remove the TlsInspector from your config and that should fix this.
from envoy.
Thanks everyone for looking into this.
@ggreenway you were right, we had a TCP max window size too small, it was set to 4096 and that wasn't enough.
For a bit more background in case you're curious it's a value we had set to optimise Haproxy and that only, unfortunately because our QA boxes are self contained, that setting affected Envoy too. It had never been a problem until the enabling of these Quantum resistant protocols.
Thank you very much again!
FYI that script doesn't seem to work on versions of Python older than 3.11.
from envoy.
Thank you very much for your assistance, I'm closing this now.
from envoy.
@dadrian it's the kernel socket receive buffer. On linux it's normally set with sysctl
. This is a shortcoming in how Envoy implements this, but it's extremely uncommon to have such a small socket receive buffer, and changing Envoy to handle this condition is not simple, so until someone decides to put in the effort to fix it, I think this will remain as a known issue.
from envoy.
@ggreenway if I'm reading #33850 (comment), it sounds like the kernel fix is not sufficient. If the second half of the ClientHello happens to arrive nontrivially later, e.g. due to packet loss, the kernel will release the first half of the ClientHello to the application and only return the second half later. That would mean that Envoy servers will be unreliable when connection to post-quantum-capable clients, including 100% of desktop Chrome.
Is that correct? If so, is there a bug somewhere that tracks making Envoy post-quantum-ready?
from envoy.
Talking to Google Envoy folks, it sounds like I misunderstood the bug. Would be good to confirm that you all indeed retry correctly when the second packet comes in late, but it sounds like it's probably fine? Sorry for the (probably) false alarm!
from envoy.
The tls_inspector waits for new data on the socket; everytime new data arrives, it reads it and feeds it into SSL_do_handshake()
. If it either gets an error condition from this call, or it receives the callback set with SSL_CTX_set_tlsext_servername_callback
, it marks itself as complete, passes the appropriate data to the filter chain matching in Envoy, and the connection proceeds.
It does not matter how many packets the ClientHello arrives in, as long as the ClientHello is less than the tls_inspector configured limit (configurable; defaults to 64KB) and the ClientHello fits in the kernel socket receive buffer (because this part of the code is using MSG_PEEK and doesn't remove the ClientHello from the socket read buffer).
There's a unit-test here that delivers the ClientHello 1 byte at a time.
from envoy.
Related Issues (20)
- huge overhead of configuration refreshing effects local rate limit and health checker HOT 6
- Implement ClientSideWeightedRoundRobin LB policy HOT 6
- Provide a way to receive ORCA load reports from hosts HOT 3
- Implement ares_reinit() to optimally handle the situation where DNS resolver needs to be re-initialized HOT 6
- Why doesn't updating RBAC with hot reload take effect on existing connection HOT 3
- Newer release available `rules_proto`: 6.0.2 (current: 5.3.0-21.7)
- Tried to use the new envoy.resource_monitors.downstream_connections parameter in envoy version 1.30.2 but its failing HOT 6
- Envoy proxy not respecting headers added within Gateway API HOT 4
- BasicAuth HTTP filter: emit metadata containing username HOT 2
- Enable fallback_policy when no healthy host in subset HOT 3
- New CEL convenience function: random() HOT 8
- Newer release available `com_github_c_ares_c_ares`: v1.31.0 (current: cares-1_20_1) HOT 1
- Perf issue with c-ares DNS resolver HOT 10
- Control Weighted Cluster Weights via Runtime config
- Question about request_mirroring#disable_shadow_host_suffix_append HOT 2
- Garbled characters are displayed when setting cookie attribute HOT 3
- Qus: Does envoy support connecting to upstream Redis with TLS enabled? HOT 4
- Newer release available `com_github_zlib_ng_zlib_ng`: 2.1.7 (current: 2.0.7) HOT 1
- Newer release available `io_bazel_rules_go`: v0.48.1 (current: v0.46.0) HOT 1
- Newer release available `rules_python`: 0.33.2 (current: 0.32.2) HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from envoy.