Code Monkey home page Code Monkey logo

Comments (15)

arnaudgeiser avatar arnaudgeiser commented on June 7, 2024 2

I have a beginning of something there with this approach. [1]
According to the first tests, it behaves as expected.

https://github.com/clj-commons/aleph/pull/641/files?diff=unified&w=1

from aleph.

arnaudgeiser avatar arnaudgeiser commented on June 7, 2024 1

(I will come with more examples and some pinpoints on Netty later today/in the evening)

from aleph.

pyr avatar pyr commented on June 7, 2024 1

GRPC is also a likely good inspiration, see https://github.com/grpc/grpc-java/blob/master/netty/src/main/java/io/grpc/netty/NettyServer.java#L344-L371

from aleph.

KingMob avatar KingMob commented on June 7, 2024 1

looks like you guys are well underway

Don't look at me, I stop my servers with Ctrl+C, the way god intended 😁

from aleph.

DerGuteMoritz avatar DerGuteMoritz commented on June 7, 2024

Oh this is great! The default timeouts are particularly egregious when running integration tests. This actually made me remove the netty/wait-for-close call at some point. Glad to hear that there is a way to parameterize this.

from aleph.

arnaudgeiser avatar arnaudgeiser commented on June 7, 2024

While those parameters are now configurable, they will work as expected when you are trying to decrease them, not increase them.

Here is an example:

(defn handler [{:keys [body]}]
  (let [res (s/reduce (fn [acc _]
                        (prn acc)
                        (Thread/sleep 1000)
                        (inc acc))
                      0 body)]
    {:body (str @res)}))

(def server (http/start-server
              #'handler
              {:port 9999
               :raw-stream? true
               :shutdown-timeout 200
               :shutdown-quiet-period 100}))

Then upload a huge file

curl localhost:9999 --data-binary @/tmp/100GiB

Wait five seconds

1
2
3
4
5

Stop the server

(.close server)

It's stopped after 5 seconds (instead of 200 you could expect)

6
7
8
9
10

Connection is closed client side

curl: (55) Send failure: Broken pipe

But this is the expected behaviour of Netty according to this comment : netty/netty#3706 (comment)

cc @pyr

Closing the issue as it has been addressed by #639

from aleph.

DerGuteMoritz avatar DerGuteMoritz commented on June 7, 2024

While those parameters are now configurable, they will work as expected when you are trying to decrease them, not increase them.

Hm when reading your example scenario, it appears that it's rather the other way around, i.e. decreasing isn't possible? 🤔

from aleph.

arnaudgeiser avatar arnaudgeiser commented on June 7, 2024

Oh this is great! The default timeouts are particularly egregious when running integration tests. This actually made me remove the netty/wait-for-close call at some point. Glad to hear that there is a way to parameterize this.

With the PR that just got merged, you will be able to solve your problem by specifying a shutdown-timeout (which defaults to 15 seconds). This parameter will guarantee your EventLoopGroup will be stopped (netty/wait-for-close) after at most X seconds.

This totally works the way you expect.

By default

(def server (http/start-server
              #'handler
              {:port 9999
               :raw-stream? true}))

(do (.close server)
    (netty/wait-for-close server))

Even without any traffic, it might take a few seconds for netty/wait-for-close to return.

With 0 second

(def server (http/start-server
              #'handler
              {:port 9999
               :raw-stream? true
               :shutdown-quiet-period 0 ;; shutdown-quiet-period should be less than shutdown-timeout
               :shutdown-timeout 0}))

(do (.close server)
    (netty/wait-for-close server))

It will be closed instantly.

With 200 second

(def server (http/start-server
              #'handler
              {:port 9999
               :raw-stream? true
               :shutdown-quiet-period 200 ;; shutdown-quiet-period should be less than shutdown-timeout
               :shutdown-timeout 200}))

(do (.close server)
    (netty/wait-for-close server))

This will guarantee the netty/wait-for-close will return after 200 seconds.
All good!


Now, the issue is with the shutdown-quiet-period which, according the documentation :

Unlike shutdown(), graceful shutdown ensures that no tasks are submitted for 'the quiet period' (usually a couple seconds) before it shuts itself down. If a task is submitted during the quiet period, it is guaranteed to be accepted and the quiet period will start over.

This is actually confusing... and can be translated in different ways.
One way is what just got merged :

optional period in seconds for which new connections will still be serviced after a scheduled shutdown via java.io.Closeable#close. Defaults to 2 seconds.

But according to netty/netty#3706 (comment), this understanding is totally inaccurate.

The EventLoopGroup will continue to accept tasks! But all the EventExecutor will be closed as soon as shutdownGracefully has been called. [1]

@pyr shout on the issue to understand what should be done to support what we could call a graceful shutdown (continue serving current requests) [2]
If we have no answer from Norman, we should probably have a look at the other Netty based HTTP servers on the Java ecosystem (Vertx, Micronaut, Undertow, Akka?)

I will reopen the issue, we should at least revisit the docstring associated to the shutdown-quiet-period.

[1] : https://github.com/netty/netty/blob/4.1/transport-classes-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java#L420
[2] netty/netty#3706 (comment)

from aleph.

arnaudgeiser avatar arnaudgeiser commented on June 7, 2024

Quickly, here are the pointers for VertX [1] and Micronaut [2].

[1] : https://github.com/eclipse-vertx/vert.x/blob/11ab144725e551be591118831377101015e1392c/src/main/java/io/vertx/core/net/impl/VertxEventLoopGroup.java#L100-L107
[2] : https://github.com/micronaut-projects/micronaut-core/blob/231cd469962c88f7b457b5b09e539b5655e568e5/http-server-netty/src/main/java/io/micronaut/http/server/netty/NettyHttpServer.java#L589-L619

from aleph.

pyr avatar pyr commented on June 7, 2024

The call to ServerListener::shutdown in GRPC seems indicate it's the right inspiration: https://github.com/grpc/grpc-java/blob/master/core/src/main/java/io/grpc/internal/ServerListener.java#L33-L37

from aleph.

arnaudgeiser avatar arnaudgeiser commented on June 7, 2024

Anyway, this issue can be closed either whether:

  1. We change the docstring of shutdown-quiet-period to match Netty behaviour
  2. We implement exactly what the docstring mentions

I can't see any reason why aleph users would expect 1. I would really strive for 2.

from aleph.

arnaudgeiser avatar arnaudgeiser commented on June 7, 2024

Just for completeness, there is one situation where it's working the way you would expect it => if you don't leave the EventLoop.

One obvious way is to pass executor :none :

(defn handler [{:keys [body]}]
  (let [res (s/reduce (fn [acc v]
                        (prn (.getName (Thread/currentThread)))
                        (Thread/sleep 1000) ;; blocking the EventLoop !!!
                        (inc acc)) 0 body)]
    (d/chain' res (fn [n]
                    {:body (str n)}))))

(def server (http/start-server #'handler
                               {:raw-stream? true
                                :port 9999
                                :executor :none}))

(do (.close server)
    (netty/wait-for-close server))

Output

"aleph-netty-server-event-pool-2"
"aleph-netty-server-event-pool-2"
"aleph-netty-server-event-pool-2"
"aleph-netty-server-event-pool-2"
"aleph-netty-server-event-pool-2"
"aleph-netty-server-event-pool-2"
"aleph-netty-server-event-pool-2"
"aleph-netty-server-event-pool-2"
"aleph-netty-server-event-pool-2"
"aleph-netty-server-event-pool-2"
"aleph-netty-server-event-pool-2"
"aleph-netty-server-event-pool-2"

The other way is to pass a custom executor which defines :onto? false

(defn handler [{:keys [body]}]
  (let [res (s/reduce (fn [acc v]
                        (prn (.getName (Thread/currentThread)))
                        (Thread/sleep 1000) ;; blocking the EventLoop !!!
                        (inc acc)) 0 body)]
    (d/chain' res (fn [n]
                    {:body (str n)}))))

(def executor (flow/utilization-executor 0.9 512
                     {:onto? false}))

(def server (http/start-server #'handler
                               {:raw-stream? true
                                :port 9999
                                :executor executor}))

(do (.close server)
    (netty/wait-for-close server))

Output

"aleph-netty-server-event-pool-2"
"aleph-netty-server-event-pool-2"
"aleph-netty-server-event-pool-2"
"aleph-netty-server-event-pool-2"
"aleph-netty-server-event-pool-2"
"aleph-netty-server-event-pool-2"
"aleph-netty-server-event-pool-2"
"aleph-netty-server-event-pool-2"
"aleph-netty-server-event-pool-2"
"aleph-netty-server-event-pool-2"
"aleph-netty-server-event-pool-2"
"aleph-netty-server-event-pool-2"

from aleph.

arnaudgeiser avatar arnaudgeiser commented on June 7, 2024

Thinking loudly...

Maybe we should track the active Channels by using a ChannelGroup [1].
Before calling .shutdownGracefully, we wait for the ChannelGroup to be empty as it removes
closed channels automatically.

Simplify shutdown process with ChannelGroup
If both ServerChannels and non-ServerChannels exist in the same ChannelGroup, any requested I/O operations on the group are performed for the ServerChannels first and then for the others.
This rule is very useful when you shut down a server in one shot:

This seems to be exactly what we need.

Someone wrote a guide about how to write a chat with Netty which broadly describes the idea. [2]

WDYT?

[1] : https://netty.io/4.0/api/io/netty/channel/group/ChannelGroup.html
[2] : https://github.com/pyr/net/blob/a2120bf61b6fd480707ea6b217545b3d640288d0/examples/guides/chat.md?plain=1#L32-L34

from aleph.

DerGuteMoritz avatar DerGuteMoritz commented on June 7, 2024

With the PR that just got merged, you will be able to solve your problem by specifying a shutdown-timeout (which defaults to 15 seconds). This parameter will guarantee your EventLoopGroup will be stopped (netty/wait-for-close) after at most X seconds.

Just for the record: I can confirm that setting both new options to 0 works as adverstised 👍 Thanks!

I haven't fully wrapped my head around the remainig issue but looks like you guys are well underway 😄

from aleph.

arnaudgeiser avatar arnaudgeiser commented on June 7, 2024

Addressed by : #642

from aleph.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.