Code Monkey home page Code Monkey logo

Comments (16)

chrismccord avatar chrismccord commented on May 25, 2024

It's not clear in your description why the second channel is erroneously closing. If it closes for a non-normal reason, you will necessarily receive error logs about the reason. Do you have any relevant logs for BChannel? We link to the tracker because our presence is necessarily tied to the tracker. If it goes down, we should also fails and when we recover and the client reconnects, the presence will be re-tracked.

from phoenix_pubsub.

hanrelan avatar hanrelan commented on May 25, 2024

The channels are closing because the Application is being stopped. The issue is that they're closing with different reasons - AChannel is exiting with reason :shutdown, whereas BChannel exits with reason :killed.
I believe this is because (please correct me if I'm wrong), when you create a channel using this guide, by default it isn't part of a supervision tree. So when you stop the application, the processes are just killed. However, when you link the channel process to a presence process, the presence process is part of a supervision tree. So when the supervisor stops the presence process with reason :shutdown, that causes the channel process to also get stopped with reason :shutdown.
By itself this isn't the worst thing, but it causes unintuitive behavior where tracking presence in a channel will cause the client not to reconnect if the Application is restarted.

The way I repro-ed this was:

  • Create a client that connects to both AChannel and BChannel
  • Run the server with iex -S mix
  • Set up traces on the AChannel and BChannel processes (I just used the observer)
  • In iex, run Application.stop(:myapp)
  • The trace logs will show
(This is AChannel's process - ie. linked to a presence tracker)
13:45:07:849561 (<0.747.0>) in_exiting 0
13:45:07:849589 (<0.747.0>) exit shutdown
13:45:07:849603 (<0.747.0>) out_exited 0
(This is BChannel's process - no presence)
13:45:12:854318 (<0.744.0>) in_exiting 0
13:45:12:854329 (<0.744.0>) exit killed
13:45:12:854332 (<0.744.0>) out_exited 0

from phoenix_pubsub.

josevalim avatar josevalim commented on May 25, 2024

Thanks for following up, that has been very well detailed.

If we are going to fix this, we will likely do it by making BChannel's exist with shutdown reason too. Therefore, could you let us know why is your application being restarted?

from phoenix_pubsub.

hanrelan avatar hanrelan commented on May 25, 2024

For deployment I build releases using distillery. We're not doing hot upgrades, so to deploy I just untar the new release and run:
bin/myapp restart

This cleanly stops the application and then starts it with the new code. My interim solution to this problem is to just kill the beam process and then run bin/myapp start

Incidentally - the behavior I'd prefer is for AChannel to exit with :killed :) That way after the restart the client will reconnect to the channel. I can see why that doesn't make sense in a clean shutdown, I just wonder how many people already rely on that reconnect behavior and would get thrown off by it not reconnecting.

Perhaps an alternate solution is to have the Presence process monitor the channel process instead of linking. Then if the channel process dies, the presence process can also kill itself, but if the presence process does the supervisor can restart it and it can monitor the channel process again?

from phoenix_pubsub.

josevalim avatar josevalim commented on May 25, 2024

Maybe the answer is to force a client reconnect unless the channel exits with normal reason.

from phoenix_pubsub.

hanrelan avatar hanrelan commented on May 25, 2024

I put together what I think is the least invasive approach that makes sense. You can see the change here: hanrelan/phoenix@2a3b38e

Essentially I treat {:shutdown, :closed} and {:shutdown, :left} as special cases which don't send phx_error, whereas all other :shutdown types do. I think this makes sense because you want those shutdowns to propagate over links (eg. to end the presence process), so changing those to :normal would break in other places. All tests still pass with this change.

Happy to submit this as a PR in a topic branch if you think this makes sense.

from phoenix_pubsub.

chrismccord avatar chrismccord commented on May 25, 2024

Maybe the answer is to force a client reconnect unless the channel exits with normal reason.

This should already happen today. If the channel exits ungracefully, the client will auto rejoin with expo back off. @hanrelan are you using the phoenix.js channels client?

from phoenix_pubsub.

hanrelan avatar hanrelan commented on May 25, 2024

I am using the phoenix.js channels client

I think Jose meant :normal in the sense of the signal. What's happening is that this code:
https://github.com/phoenixframework/phoenix/blob/master/lib/phoenix/socket/transport.ex#L288
causes the server to send a phx_close event to the client when the process receives the :shutdown command. Once the client receive phx_close, it doesn't try to reconnect.

However, :shutdown is used by supervisors to stop processes, so in this case during an application restart it's stopping the Presence process which is propagating the :shutdown to the channel process. This results in the client not reconnecting if the server is restarted (in the sense of a release restart).

The diff I sent before changes it so only explicitly closing the channel (via the client's leave() function or the server's close) is considered a clean shutdown. All other shutdowns - including one enforced by a supervisor or via a link - is considered unclean and the client will try to reconnect.

This keeps the behavior between processes that are linked to a presence and processes that aren't linked to a presence the same with respect to shutdowns and reconnect attemps.

Happy to discuss on Slack as well if that's easier, I'm @hanrelan in the phoenix channel

from phoenix_pubsub.

chrismccord avatar chrismccord commented on May 25, 2024

Ah I see. We could change the behavior of the phx_close event to differentiate :normal and :shutdown, but I'm unsure what makes the most sense. For example, in your channel code today, you could do:

{:stop, {:shutdown, :normal}, socket}

What should happen in that case? We won't be able to determine the difference between this graceful exit and a "graceful restart" where the client should re-establish a connection.

from phoenix_pubsub.

josevalim avatar josevalim commented on May 25, 2024

@chrismccord I think we should consider it the same as a link. Which means that we will break on everything that is not :normal. I will push the changes to Phoenix master today. :)

from phoenix_pubsub.

josevalim avatar josevalim commented on May 25, 2024

So I attempted a fix at this and we use {:shutdown, ...} for clean shutdown in many cases. I am not sure what is the alternative. We could special case :shutdown but that may be confusing.

from phoenix_pubsub.

chrismccord avatar chrismccord commented on May 25, 2024

yes, that is my concern that {:shutdown, :normal} is a clean shutdown, which we would need to propagate to the client as a phx_error, which is wrong. A pre app shutdown task might be the way to go, that forcefully kills transport pids, shuts down the transport, then issues the app shutdown. That way restarts would leave clients in rejoin mode? I see no other way around it.

from phoenix_pubsub.

hanrelan avatar hanrelan commented on May 25, 2024

Another thought, but obviously you guys are much more familiar with this than I am so feel free to ignore if this makes no sense.
What if instead of linking to the channel process, Phoenix.Tracker just set up a monitor of the channel process instead. Then if the channel process dies, the Tracker can just stop itself.

The other case to deal with is if the Tracker process errors out for some reason, you don't want presence to silently stop working for a channel process. My understanding is a fuzzier here, but maybe this could be resolved by using a transient restart strategy in the presence process' supervisor so it gets restarted if it errors?

from phoenix_pubsub.

josevalim avatar josevalim commented on May 25, 2024

@hanrelan the tracker trap exits exactly because we don't want the tracker to crash if a "client" crashes. But we need to link to solve your last paragraph otherwise there is no way for the "presence client" to be brought down if the presence server crashes.

The client could monitor and raise but links are more appropriate here. I believe @chrismccord is working on a solution though. :)

from phoenix_pubsub.

hanrelan avatar hanrelan commented on May 25, 2024

Ah got it I misunderstood the purpose of the link. Thanks for the explanation and the quick response to this issue @chrismccord and @josevalim!

from phoenix_pubsub.

chrismccord avatar chrismccord commented on May 25, 2024

This has been fixed on phoenix master and will ship with the 1.3 release. Check the PR commit details for details but tldr; is your app restarts will be seen as errors by the client and they will reconnect. Cheers!

from phoenix_pubsub.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.