Code Monkey home page Code Monkey logo

Comments (21)

adisuissa avatar adisuissa commented on September 24, 2024 1

cc @ggreenway who may know more about the different TLS protocols that are supported.

from envoy.

ggreenway avatar ggreenway commented on September 24, 2024 1

@vparla thanks for that. I used the linked python script to test against the latest Envoy and it worked correctly (Envoy sent back a ServerHello).

@cd-fernando can you try the script at https://github.com/dadrian/tldr.fail/blob/main/tldr_fail_test.py and see what it returns?

Can anyone else test this and report success or failure?

from envoy.

dadrian avatar dadrian commented on September 24, 2024 1

Just so I can document this on tldr.fail---where is the socket read buffer configured in this context? Is that an Envoy, TLS Inspector, or kernel setting (or somewhere else)? I would have expected that to be internal to Envoy's implementation, but it didn't seem like this needed a code change?

from envoy.

ggreenway avatar ggreenway commented on September 24, 2024

I would find it shocking if chrome wouldn't be willing to negotiate one of the older TLS 1.3 ciphers in this case; I'd guess that most TLS endpoints on the internet don't yet support the post-quantum cipher suites. Regardless, I think if there's an issue here, it's a bug in chrome.

from envoy.

cd-fernando avatar cd-fernando commented on September 24, 2024

I would find it shocking if chrome wouldn't be willing to negotiate one of the older TLS 1.3 ciphers in this case; I'd guess that most TLS endpoints on the internet don't yet support the post-quantum cipher suites. Regardless, I think if there's an issue here, it's a bug in chrome.

The problem has made some news website like MSN, I'm not sure what's the policy about posting links so I'll post an excerpt

Despite months of testing, the problem seems to have risen from web servers failing to adequately implement TLS, rather than an issue with Chrome. The error results in the rejection of connections that use the Kyber768 quantum-resistant key agreement algorithm, including connections with Chrome’s hybrid key.
Clearly, this is not a simple fix that can be implemented by Chrome, but it requires a larger and more orchestrated effort to transform the Internet into one that can handle sophisticated quantum-safe cryptography.
For now, affected users are being advised to disable the TLS 1.3 hybridized Kyber support in Chrome. However, long-term post-quantum secure ciphers will be essential in TLS, and the ability to disable the feature will likely be removed in the future, highlighting the importance of addressing the issue’s route cause earlier on so that websites can be prepared for quantum-based attacks in the future.

from envoy.

cd-fernando avatar cd-fernando commented on September 24, 2024

I might have found some helpful comment on DDG

These errors are not caused by a bug in Google Chrome but instead caused by web servers failing to properly implement Transport Layer Security (TLS) and not being able to handle larger ClientHello messages for post-quantum cryptography.

I think I saw a setting that affects this size in Envoy

from envoy.

cd-fernando avatar cd-fernando commented on September 24, 2024

I can't find such option, any suggestions?

from envoy.

ggreenway avatar ggreenway commented on September 24, 2024

Can you capture a full tcpdump of the failed handshake and post it?

from envoy.

cd-fernando avatar cd-fernando commented on September 24, 2024

@ggreenway it doesn't seem to accept pcap files, any suggestions?

from envoy.

cd-fernando avatar cd-fernando commented on September 24, 2024

kyber768.pcap.gz

Ignore that, here's the gzipped pcap

from envoy.

ggreenway avatar ggreenway commented on September 24, 2024

Huh, it looks like the tcp window is closed after only 1400 bytes. Can you post the full envoy configuration you used for this test?

from envoy.

sschepens avatar sschepens commented on September 24, 2024

Huh, it looks like the tcp window is closed after only 1400 bytes. Can you post the full envoy configuration you used for this test?

1400 bytes seems like a typical MTU, maybe we're limiting handshake to a single packet?

from envoy.

cd-fernando avatar cd-fernando commented on September 24, 2024

Here's a cut-down version of our config, I hope I didn't axe too much

---
admin:
  # access_log_path: /tmp/admin_access.log
  address:
    socket_address: { address: 0.0.0.0, port_value: 9901 }
static_resources:
  listeners:
### BEGIN http frontends ###
  - name: apis
    address:
      socket_address: { address: 0.0.0.0, port_value: 443 }
    listener_filters:
    - name: "envoy.filters.listener.tls_inspector"
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.filters.listener.tls_inspector.v3.TlsInspector
    filter_chains:
    - filter_chain_match:
        server_names: ["*.testdomain.dev"]
        transport_protocol: "tls"
      transport_socket:
        name: envoy.transport_sockets.tls
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
          common_tls_context:
            tls_params:
              tls_minimum_protocol_version: TLSv1_2
              cipher_suites: "ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305"
            tls_certificates:
            - certificate_chain:
                filename: /etc/envoy/STAR.testdomain.dev.crt
              private_key:
                filename: /etc/envoy/STAR.testdomain.dev.key
      filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: ingress_http
          upgrade_configs:
            - upgrade_type: connect
          codec_type: AUTO
          use_remote_address: true
          xff_num_trusted_hops: 0
          access_log:
            - name: envoy.access_loggers.file
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
                path: "/dev/stdout"
          route_config:
            name: local_route
            virtual_hosts:
            - name: system_api
              domains: ["api.testdomain.dev"]
              routes:
              - match: { prefix: "/api/interact/" }
                route: { cluster: windfarm }
          http_filters:
          - name: envoy.filters.http.router
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
  clusters:
  - name: windfarm
    connect_timeout: 0.25s
    type: STATIC
    dns_lookup_family: V4_ONLY
    lb_policy: ROUND_ROBIN
    health_checks:
      timeout: 1s
      interval: 1s
      unhealthy_threshold: 1
      healthy_threshold: 2
      http_health_check:
        path: /ping
    load_assignment:
      cluster_name: windfarm
      endpoints:
        - lb_endpoints:
           - endpoint:
              address:
                socket_address:
                  address: 127.0.0.1
                  port_value: 29091

from envoy.

vparla avatar vparla commented on September 24, 2024

https://tldr.fail/ describes the issue. It is likely that Envoy is not reading the entire Client Hello as it spans packets.
Python test scripts can be found here:
github.com/dadrian/tldr.fail/blob/main/tldr_fail_test.py

from envoy.

ggreenway avatar ggreenway commented on September 24, 2024

I got one more report of it working correctly with Chrome.

I think I understand what's going on: the TlsInspector filter doesn't read from the socket; it peeks. This means the entire ClientHello needs to fit into the configured socket read buffer size.

I saw in the tcpdump that the server had a fully filled up tcp window (of only about 1500 bytes), and I didn't realize until now that it was the TlsInspector, not the TLS transport socket, that was getting stuck.

I've never seen a socket read buffer configured that small. What OS are you using?

If you don't need to select a filter chain based on SNI, you can remove the TlsInspector from your config and that should fix this.

from envoy.

cd-fernando avatar cd-fernando commented on September 24, 2024

Thanks everyone for looking into this.
@ggreenway you were right, we had a TCP max window size too small, it was set to 4096 and that wasn't enough.

For a bit more background in case you're curious it's a value we had set to optimise Haproxy and that only, unfortunately because our QA boxes are self contained, that setting affected Envoy too. It had never been a problem until the enabling of these Quantum resistant protocols.

Thank you very much again!

FYI that script doesn't seem to work on versions of Python older than 3.11.

from envoy.

cd-fernando avatar cd-fernando commented on September 24, 2024

Thank you very much for your assistance, I'm closing this now.

from envoy.

ggreenway avatar ggreenway commented on September 24, 2024

@dadrian it's the kernel socket receive buffer. On linux it's normally set with sysctl. This is a shortcoming in how Envoy implements this, but it's extremely uncommon to have such a small socket receive buffer, and changing Envoy to handle this condition is not simple, so until someone decides to put in the effort to fix it, I think this will remain as a known issue.

from envoy.

davidben avatar davidben commented on September 24, 2024

@ggreenway if I'm reading #33850 (comment), it sounds like the kernel fix is not sufficient. If the second half of the ClientHello happens to arrive nontrivially later, e.g. due to packet loss, the kernel will release the first half of the ClientHello to the application and only return the second half later. That would mean that Envoy servers will be unreliable when connection to post-quantum-capable clients, including 100% of desktop Chrome.

Is that correct? If so, is there a bug somewhere that tracks making Envoy post-quantum-ready?

from envoy.

davidben avatar davidben commented on September 24, 2024

Talking to Google Envoy folks, it sounds like I misunderstood the bug. Would be good to confirm that you all indeed retry correctly when the second packet comes in late, but it sounds like it's probably fine? Sorry for the (probably) false alarm!

from envoy.

ggreenway avatar ggreenway commented on September 24, 2024

The tls_inspector waits for new data on the socket; everytime new data arrives, it reads it and feeds it into SSL_do_handshake(). If it either gets an error condition from this call, or it receives the callback set with SSL_CTX_set_tlsext_servername_callback, it marks itself as complete, passes the appropriate data to the filter chain matching in Envoy, and the connection proceeds.

It does not matter how many packets the ClientHello arrives in, as long as the ClientHello is less than the tls_inspector configured limit (configurable; defaults to 64KB) and the ClientHello fits in the kernel socket receive buffer (because this part of the code is using MSG_PEEK and doesn't remove the ClientHello from the socket read buffer).

There's a unit-test here that delivers the ClientHello 1 byte at a time.

from envoy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.