Code Monkey home page Code Monkey logo

Comments (6)

paulstuart avatar paulstuart commented on May 24, 2024

@freeekanayaka, this issue is still a problem (I would like to help resolve it if I can). It's rather easy to recreate:

  1. Create the test table create table simple (id integer primary key, other integer)
  2. Start a long running loop that inserts the last known id value, e.g. insert into simple (other) values(?), where other is derived from results. LastInsertId(). The key point is that id and other should always be equal.
  3. Start a second loop that iterates over the node numbers and transfers leadership to the next node.
  4. There will be errors that occur in the primary loop, and on occasion there will be a mismatch between id and other. That is the bug in question.

from dqlite.

paulstuart avatar paulstuart commented on May 24, 2024

In framesAbortBecauseLeadershipLost (replication.c), there's an if/else statement based on is_commit, but the handling is exactly the same for both cases.

from dqlite.

freeekanayaka avatar freeekanayaka commented on May 24, 2024

Interesting breakdown. As first step, I'd suggest to put in place a unit test or at least a program that implements the procedure you outline and fails as you mention. With that at hand it should be easier to further investigate the issue, come up with a design for the solution, implement it and prove that it works (the unit test doesn't fail anymore).

The term "two phase commit" is probably inappropriate, as raft is by itself two-phase (a quorum is needed).

I suspect the issue here has more to do with client and server behavior when leadership is lost. The raft paper describes roughly what should happen: an operation ID should be maintained for each client request, if a request (such as committing the transaction performing the INSERT) fails then the client should retry the request, presenting the same operation ID to the new leader. In turn, the new leader should either perform the request, or no-op it if it turns out that the request was actually performed, but the client failed to receive the confirmation because the leader it initially submitted it to had died and could not notify the client back.

from dqlite.

freeekanayaka avatar freeekanayaka commented on May 24, 2024

So far I've deferred addressing this issue since I suspect it requires a fair amount of thinking and work, however I still intend to nail when I'll have some time.

from dqlite.

paulstuart avatar paulstuart commented on May 24, 2024

I'd like to do anything I can to lessen the load for you, as this is important to my project. Your original notes appear to be out of date, so any further brain dumps would be welcomed.

Testing this issue is a pain because it requires running a cluster under load and simultaneously hammering it with repeated transfers (or server restarts) until the magic moment occurs.

One thought was to add a "sleep" function to sqlite statements to create a long running transaction that is easier to test such actions mid-transaction. If you think that would be valuable I'd be happy to get that going.

from dqlite.

freeekanayaka avatar freeekanayaka commented on May 24, 2024

Yes, those notes are out of date. The brain dump is basically what I wrote (assuming the issue is what I think it is), although that's admittedly hand-waving.

As said, coming up with a program that if ran long enough eventually reproduces the error would be probably a very good start.

from dqlite.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.