Code Monkey home page Code Monkey logo

Comments (8)

daipom avatar daipom commented on May 23, 2024

After some digging in the code i think i might have found the reason for this behaviour.
Every heartbeat/write function, a new socket is created and then connect(dest_addr) is called, doing the TLS handshake.
After that establish_connection is called to do the hello, ping, pong protocol.
it seems that if after the TLS handshake, when we reach establish_connection, the connection drops between the fluentd and destination server, fluentd doesn't seem to detect or timeout the socket and as far as I can tell get stuck in this loop, where socket is still considered up, but will never contain any data.

I expect the socket to timeout if it failed to establish connection after some time.
Perhaps using IO.select.

Thanks for your report. We may need to do some digging.
We need a simple way to reproduce the problem.

from fluentd.

Gtharan10 avatar Gtharan10 commented on May 23, 2024

We're encountering a similar issue while distributing 80,000 events per second to two separate systems. Have there been any updates or workarounds identified for this?

from fluentd.

daipom avatar daipom commented on May 23, 2024

We're encountering a similar issue while distributing 80,000 events per second to two separate systems. Have there been any updates or workarounds identified for this?

No.
I have not made time to look into this issue in detail.

I'll see if I can reproduce it.
I'd be glad to receive any information that could help us reproduce the issue.

from fluentd.

Gtharan10 avatar Gtharan10 commented on May 23, 2024

Yes...

With the below configurations for forwarder and aggregator:

Both of these are in a Multi Process Workers environment, with 4 workers on each node.

<match udp.input.**>
    @type forward
    require_ack_response true
    heartbeat_type udp
    <buffer>
        @type memory
        flush_interval 1s
        flush_thread_count 10
        chunk_limit_size 50m
        queue_limit_length 500
        chunk_limit_size 100m
        overflow_action drop_oldest_chunk
        retry_max_interval 10m
        retry_forever true
        delayed_commit_timeout 100
    </buffer>
   <server>
     host 10.10.1.2
     port 24224
     weight 60
   </server>
   <server>
     host 10.10.1.3
     port 24224
     weight 60
   </server>
</match>

<source>
    @type forward
    port 24224
    bind 0.0.0.0
    tag udp.forward
</source>

It appears that when attempting to distribute events, Node 1 receives events via UDP and then shares them with other nodes using the forwarder plugin. However, after approximately 10 seconds, fluentd enters a stale mode where it no longer accepts new incoming events and only forwards heartbeats. And I haven't seen any error logs stating the above behaviour.

After checking the network calls with strace we are suspecting this

[pid 54679] recvfrom(13, 0x7f70df800000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70df800000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70df800000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70df800000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70df800000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70df800000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70df800000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70df800000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70df800000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70df800000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70df800000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, "{\"reportType\": \"sslsession\", \"so"..., 10485760, 0, {sa_family=AF_INET, sin_port=htons(38897), sin_addr=inet_addr("10.10.1.5")}, [2048 => 16]) = 59168
[pid 54679] recvfrom(13, 0x7f70dec00000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70dec00000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70dec00000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70dec00000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70dec00000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70dec00000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70dec00000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70dec00000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70dec00000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70dec00000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70dec00000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70dec00000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70dec00000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54676] recvfrom(10, "\201\243ack\271YUgPCpFf8KyAOu0reJa0wg==\n", 512, 0, 0x7f70e6efd790, [2048 => 0]) = 31
[pid 54676] shutdown(10, SHUT_WR)       = 0
[pid 54669] socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_TCP) = 10
[pid 54669] connect(10, {sa_family=AF_INET, sin_port=htons(24224), sin_addr=inet_addr("10.10.1.3")}, 16) = -1 EINPROGRESS (Operation now in progress)
[pid 54669] getsockopt(10, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
[pid 54669] getsockopt(10, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
[pid 54669] getsockname(10, {sa_family=AF_INET, sin_port=htons(51956), sin_addr=inet_addr("10.10.1.1")}, [2048 => 16]) = 0
[pid 54669] setsockopt(10, SOL_SOCKET, SO_LINGER, {l_onoff=1, l_linger=60}, 8) = 0
[pid 54669] setsockopt(10, SOL_SOCKET, SO_RCVTIMEO_OLD, "\276\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0
[pid 54669] setsockopt(10, SOL_SOCKET, SO_SNDTIMEO_OLD, "<\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0

<PROCESS STALLS HERE FOR A WHILE, DURING WHICH ONLY PERIODIC HEARTBEAT SIGNALS ARE SENT TO THE DEVICE>

[pid 54675] sendto(9, "\0", 1, 0, {sa_family=AF_INET, sin_port=htons(24224), sin_addr=inet_addr("10.10.1.2")}, 16) = 1
[pid 54675] sendto(9, "\0", 1, 0, {sa_family=AF_INET, sin_port=htons(24224), sin_addr=inet_addr("10.10.1.3")}, 16) = 1
[pid 54675] recvfrom(9, "\0", 512, 0, {sa_family=AF_INET, sin_port=htons(24224), sin_addr=inet_addr("10.10.1.3")}, [2048 => 16]) = 1
[pid 54675] recvfrom(9, "\0", 512, 0, {sa_family=AF_INET, sin_port=htons(24224), sin_addr=inet_addr("10.10.1.2")}, [2048 => 16]) = 1
[pid 54676] recvfrom(10, "\201\243ack\271YUgPDAI3vNb9cuJQbyVasg==\n", 512, 0, 0x7f70e6efd790, [2048 => 0]) = 31
[pid 54676] shutdown(10, SHUT_WR)       = 0
[pid 54675] sendto(9, "\0", 1, 0, {sa_family=AF_INET, sin_port=htons(24224), sin_addr=inet_addr("10.10.1.2")}, 16) = 1
[pid 54675] sendto(9, "\0", 1, 0, {sa_family=AF_INET, sin_port=htons(24224), sin_addr=inet_addr("10.10.1.3")}, 16) = 1
[pid 54675] recvfrom(9, "\0", 512, 0, {sa_family=AF_INET, sin_port=htons(24224), sin_addr=inet_addr("10.10.1.2")}, [2048 => 16]) = 1
[pid 54675] recvfrom(9, "\0", 512, 0, {sa_family=AF_INET, sin_port=htons(24224), sin_addr=inet_addr("10.10.1.3")}, [2048 => 16]) = 1
[pid 54675] sendto(9, "\0", 1, 0, {sa_family=AF_INET, sin_port=htons(24224), sin_addr=inet_addr("10.10.1.2")}, 16) = 1
[pid 54675] sendto(9, "\0", 1, 0, {sa_family=AF_INET, sin_port=htons(24224), sin_addr=inet_addr("10.10.1.3")}, 16) = 1
[pid 54675] recvfrom(9, "\0", 512, 0, {sa_family=AF_INET, sin_port=htons(24224), sin_addr=inet_addr("10.10.1.2")}, [2048 => 16]) = 1
[pid 54675] recvfrom(9, "\0", 512, 0, {sa_family=AF_INET, sin_port=htons(24224), sin_addr=inet_addr("10.10.1.3")}, [2048 => 16]) = 1

<SKIPPING A FEW BEATS TO SHORTEN THE LOG>

[pid 54675] sendto(9, "\0", 1, 0, {sa_family=AF_INET, sin_port=htons(24224), sin_addr=inet_addr("10.10.1.2")}, 16) = 1
[pid 54675] sendto(9, "\0", 1, 0, {sa_family=AF_INET, sin_port=htons(24224), sin_addr=inet_addr("10.10.1.3")}, 16) = 1
[pid 54675] recvfrom(9, "\0", 512, 0, {sa_family=AF_INET, sin_port=htons(24224), sin_addr=inet_addr("10.10.1.2")}, [2048 => 16]) = 1
[pid 54675] recvfrom(9, "\0", 512, 0, {sa_family=AF_INET, sin_port=htons(24224), sin_addr=inet_addr("10.10.1.3")}, [2048 => 16]) = 1
[pid 54675] sendto(9, "\0", 1, 0, {sa_family=AF_INET, sin_port=htons(24224), sin_addr=inet_addr("10.10.1.2")}, 16) = 1
[pid 54675] sendto(9, "\0", 1, 0, {sa_family=AF_INET, sin_port=htons(24224), sin_addr=inet_addr("10.10.1.3")}, 16) = 1
[pid 54675] recvfrom(9, "\0", 512, 0, {sa_family=AF_INET, sin_port=htons(24224), sin_addr=inet_addr("10.10.1.2")}, [2048 => 16]) = 1
[pid 54675] recvfrom(9, "\0", 512, 0, {sa_family=AF_INET, sin_port=htons(24224), sin_addr=inet_addr("10.10.1.3")}, [2048 => 16]) = 1
[pid 54675] sendto(9, "\0", 1, 0, {sa_family=AF_INET, sin_port=htons(24224), sin_addr=inet_addr("10.10.1.2")}, 16) = 1
[pid 54675] sendto(9, "\0", 1, 0, {sa_family=AF_INET, sin_port=htons(24224), sin_addr=inet_addr("10.10.1.3")}, 16) = 1
[pid 54675] recvfrom(9, "\0", 512, 0, {sa_family=AF_INET, sin_port=htons(24224), sin_addr=inet_addr("10.10.1.2")}, [2048 => 16]) = 1
[pid 54675] recvfrom(9, "\0", 512, 0, {sa_family=AF_INET, sin_port=htons(24224), sin_addr=inet_addr("10.10.1.3")}, [2048 => 16]) = 1

<AGAIN PROCESS RESUMES HERE>

[pid 54679] recvfrom(13, 0x7f70dec00000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, "{\"reportType\": \"sslsession\", \"so"..., 10485760, 0, {sa_family=AF_INET, sin_port=htons(42461), sin_addr=inet_addr("10.10.1.5")}, [2048 => 16]) = 58968
[pid 54679] recvfrom(13, 0x7f70dd000000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, "{\"reportType\": \"sslsession\", \"so"..., 10485760, 0, {sa_family=AF_INET, sin_port=htons(60693), sin_addr=inet_addr("10.10.1.5")}, [2048 => 16]) = 59165
[pid 54679] recvfrom(13, 0x7f70e0400000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70e0400000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70e0400000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70e0400000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70e0400000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70e0400000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70e0400000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70e0400000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70e0400000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70e0400000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70e0400000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70e0400000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70e0400000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70e0400000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70e0400000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70e0400000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70e0400000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70e0400000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70e0400000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70e0400000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, "{\"reportType\": \"sslsession\", \"so"..., 10485760, 0, {sa_family=AF_INET, sin_port=htons(60693), sin_addr=inet_addr("10.10.1.5")}, [2048 => 16]) = 59165
[pid 54679] recvfrom(13, 0x7f70e1000000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70e1000000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70e1000000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70e1000000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70e1000000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70e1000000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70e1000000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70e1000000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70e1000000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70e1000000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, 0x7f70e1000000, 10485760, 0, 0x7f70e5afdaf0, [2048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 54679] recvfrom(13, "{\"reportType\": \"sslsession\", \"so"..., 10485760, 0, {sa_family=AF_INET, sin_port=htons(46832), sin_addr=inet_addr("10.10.1.5")}, [2048 => 16]) = 59071

If there's any other information needed on this issue, please let me know.

from fluentd.

Gtharan10 avatar Gtharan10 commented on May 23, 2024

did you manage to replicate the issue @daipom ?

from fluentd.

daipom avatar daipom commented on May 23, 2024

Sorry, I haven't made time for this.
Thanks for your information!

With the below configurations for forwarder and aggregator:

Does this mean that this issue reproduces within the same machine (setting in_forward and out_forward in the same config)?
If so, I think I could try to reproduce it.

Or, does this reproduce only under the following infrastructure (where in_forward and out_forward don't connect directly)?

It's important to take note that in our infrastructure, the out_forward and in_forward servers doesn't connect directly to each other, they has a few components in-between them, so if the overall connection drops it doesn't necessarily mean that the socket will drop, so we have to rely on options like timeouts.

from fluentd.

Gtharan10 avatar Gtharan10 commented on May 23, 2024

This setup is currently deployed in an Esxi host. We have three nodes deployed independently, and we have been able to reproduce the issue by continuously streaming across the nodes.

The actual flow will be like this
UDP -> Log Forwarder -> Forward aggregator -> Opensearch

I have three questions:

  1. If the forwarder gets stuck in the middle, why does the UDP plugin also stop receiving data? Only heartbeats are sent to the other nodes.
  2. Are there any other steps to produce dumps for the Ruby code to check what is happening in real-time?
  3. Are there any limitations regarding the amount of data that can be continuously transmitted by the forwarder?

from fluentd.

lkwiatek-sc avatar lkwiatek-sc commented on May 23, 2024

I'm observing the same issue in our setup on MacOS, it usually occurs when the macbook lid is closed. Based on the logs I noticed that macbook wakes up from time to time to do certain things and during these wake-ups fluentd is more likely to enter the infinite loop in the establish_connection.

I'm using TLS and heartbeat disabled.

I've added some extra error logging to the rescue IO::WaitReadable => e and during normal operation the rescue statement triggers from time to time (usually no more than 10 retries):

2024-04-22 16:31:31 +0200 [warn]: #0 IO::WaitReadable host="..." port=12345 retry_count=2 error_class=OpenSSL::SSL::SSLErrorWaitReadable error="read would block"
2024-04-22 16:31:31 +0200 [warn]: #0 IO::WaitReadable host="..." port=12345 retry_count=4 error_class=OpenSSL::SSL::SSLErrorWaitReadable error="read would block"
2024-04-22 16:31:31 +0200 [warn]: #0 IO::WaitReadable host="..." port=12345 retry_count=5 error_class=OpenSSL::SSL::SSLErrorWaitReadable error="read would block"
2024-04-22 16:31:33 +0200 [warn]: #0 IO::WaitReadable host="..." port=12345 retry_count=1 error_class=OpenSSL::SSL::SSLErrorWaitReadable error="read would block"
2024-04-22 16:31:33 +0200 [warn]: #0 IO::WaitReadable host="..." port=12345 retry_count=2 error_class=OpenSSL::SSL::SSLErrorWaitReadable error="read would block"
2024-04-22 16:31:33 +0200 [warn]: #0 IO::WaitReadable host="..." port=12345 retry_count=4 error_class=OpenSSL::SSL::SSLErrorWaitReadable error="read would block"
2024-04-22 16:31:33 +0200 [warn]: #0 IO::WaitReadable host="..." port=12345 retry_count=5 error_class=OpenSSL::SSL::SSLErrorWaitReadable error="read would block"
2024-04-22 16:31:33 +0200 [warn]: #0 IO::WaitReadable host="..." port=12345 retry_count=6 error_class=OpenSSL::SSL::SSLErrorWaitReadable error="read would block"

When it enter the infinite loop the error stays the same, but the retry count increase forever:

2024-04-23 09:47:03 +0200 [warn]: #0 IO::WaitReadable host="..." port=12345 retry_count=30437 error_class=OpenSSL::SSL::SSLErrorWaitReadable error="read would block"
2024-04-23 09:47:03 +0200 [warn]: #0 IO::WaitReadable host="..." port=12345 retry_count=30438 error_class=OpenSSL::SSL::SSLErrorWaitReadable error="read would block"
2024-04-23 09:47:03 +0200 [warn]: #0 IO::WaitReadable host="..." port=12345 retry_count=30439 error_class=OpenSSL::SSL::SSLErrorWaitReadable error="read would block"
2024-04-23 09:47:03 +0200 [warn]: #0 IO::WaitReadable host="..." port=12345 retry_count=30440 error_class=OpenSSL::SSL::SSLErrorWaitReadable error="read would block"
2024-04-23 09:47:03 +0200 [warn]: #0 IO::WaitReadable host="..." port=12345 retry_count=30441 error_class=OpenSSL::SSL::SSLErrorWaitReadable error="read would block"
2024-04-23 09:47:03 +0200 [warn]: #0 IO::WaitReadable host="..." port=12345 retry_count=30442 error_class=OpenSSL::SSL::SSLErrorWaitReadable error="read would block"
2024-04-23 09:47:03 +0200 [warn]: #0 IO::WaitReadable host="..." port=12345 retry_count=30443 error_class=OpenSSL::SSL::SSLErrorWaitReadable error="read would block"
2024-04-23 09:47:03 +0200 [warn]: #0 IO::WaitReadable host="..." port=12345 retry_count=30444 error_class=OpenSSL::SSL::SSLErrorWaitReadable error="read would block"

My current workaround probably doesn't address the actual issue (why read_nonblock infinitely returns retriable error), but is good enough to prevent infinite loop:

612a613
>         retry_count = 0
627c628,636
<           rescue IO::WaitReadable
---
>           rescue IO::WaitReadable => e
>             # On MacOS under certain circumstances read_nonblock will infinitely raise OpenSSL::SSL::SSLErrorWaitReadable
>             # (error="read would block"). During normal operation retry_count usually does not exceed 10, thus we set the
>             # limit to 25 to be on the safe side.
>             if retry_count > 25
>               @log.warn "retry count over 25 times", host: @host, port: @port, "last error": e
>               disable!
>               break
>             end
630a640
>             retry_count += 1

from fluentd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.