Code Monkey home page Code Monkey logo

Comments (5)

tgross avatar tgross commented on August 20, 2024

Clarification on this: we are stopping replication when we're supposed to and successfully failing over replication. It's just that we retry failover over and over again even though we were successful the first time.

Edit: we're never exiting the on_change handler's while True loop, turns out.

from mysql.

tgross avatar tgross commented on August 20, 2024

Ok, I've figured out why this happens. The on_change function sees that the primary has changed in Consul, and stops replication immediately. We know at this point no matter whether we're the new primary or not that we have to stop replication. Then we enter a loop while we poll Consul to either obtain the lock to become the primary or to find out who the new primary is.

But meanwhile, a health check is coming along and performing assert_initialized_for_state, which discovers "hey we're not set up for replication!" and it sets the primary correctly. But then when the onChange handler reaches that same point the replication hasn't stopped and we throw the error. We catch a very general exception in the on_change function to loop forever.

So we have a race between the two processes. MySQL doesn't support an atomic "stop replication and then start it over here." If we stop replication later in the on_change function we reduce the window for the race but don't eliminate it entirely. If we lock setting up replication so that the health check can't continue, we risk having availability issues during failover.

My proposal to fix this is to catch the "This operation cannot be performed with a running slave" exception and simply exit gracefully. We'll hit this in two cases:

  • The scenario described above, in which case the health handler already has set up replication for us.
  • If stopping replication has not completed or it's timed out (perhaps because we're under heavy replication load). In this case, the next pass of the health handler will set up replication for us.

from mysql.

tgross avatar tgross commented on August 20, 2024

It occurs to me that another approach that might work is to have the onChange handler simply stop replication and try to get the lock without setting up new replication. This way as soon as the primary is decided the next health check can change replication. I'll try out both methods and see which seems more robust.

from mysql.

tgross avatar tgross commented on August 20, 2024

Opened #27. I think there may be something to the second approach I outlined above but at the moment I think it's brittle so I went with the original idea of simply exiting cleanly.

from mysql.

tgross avatar tgross commented on August 20, 2024

#27 was merged in and I've pushed new tags autopilotpattern/mysql:5.6r2.1.1 and autopilotpattern/mysql:latest to the Docker Hub.

from mysql.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.