Code Monkey home page Code Monkey logo

Comments (7)

freeekanayaka avatar freeekanayaka commented on May 23, 2024

Ok, the last assertion failure means there is a bug in dqlite, where (presumably in consequence of some unexpected scenario) we close the SQLite db without cleaning up properly (e.g. closing prepared statements, or ongoing queries etc).

Running the dqlite server and client in the same process should be perfectly fine, and indeed it's what I'd expect people to normally do. Why do you think that's problematic? Of course the client must be able to retry, and not give up killing the entire process if it fails, if that's what you meant.

To detect that the cluster is stable, calling db.Ping() should be a good strategy for the setting initial process state. Why does that end up in a no-op for you? It should succeed only once the cluster has a leader.

All that being said, I think there is a bug where if you don't run queries like "CREATE IFNOT EXISTS my_table ...``` before queries like "SELECT * FROM my_table", then the SELECT might fail with a "table does not exist" error. But that probably doesn't apply to your situation, so I'm just mentioning it as a heads up.

from dqlite.

ibuildthecloud avatar ibuildthecloud commented on May 23, 2024

db.Ping() just seemed to work even when things were not healthy and would result in the that assert failure above.

I currently have a cluster that is stuck with three nodes rebooting because they keep hitting Assertion failed: rc == 0 (src/leader.c: leader__close: 257) on start. Is there anything you would want to know to debug this?

from dqlite.

ibuildthecloud avatar ibuildthecloud commented on May 23, 2024

Also, prior to to this cluster getting into this situation, two of the node were failing non-stop with bad connection error while doing the below sanity test

func ping(ctx context.Context, db *sql.DB) error {
	row, err := db.QueryContext(ctx, "SELECT 1")
	if err != nil {
		return err
	}
	defer row.Close()

	row.Next()
	return row.Err()
}

from dqlite.

freeekanayaka avatar freeekanayaka commented on May 23, 2024

I've added a test in canonical/go-dqlite#66 to confirm that db.Ping() returns an error if there's no leader, and returns no error if there's one (see the new TestIntegration_PingOnlyWorksOnceLeaderElected function). So that should be either a red herring or your particular scenario is somehow different than the test.

Note that db.Ping() makes use of connections internally cached by the db Go package, so if you call db.Ping() twice, it will return no error, even if the cluster lost the leader between those calls. That's confirmed by the unit test as well, not sure if that applies to you.

As per the bad connection error, that is a driver.ErrBadConn that is returned by the dqlite driver implementation mostly when there is no leader available. In that case the transaction should be retried by the client code, using some backoff.

Finally, the assertion failure seems to be the bug here. If you can reproduce it reliably, would it be possible to attach or send me a tarball with reproducing data and code or equivalent info?

from dqlite.

ibuildthecloud avatar ibuildthecloud commented on May 23, 2024

@freeekanayaka I'll try to send the data of a failing cluster. How can I send it to you? My email is [email protected] if you'd like to share your email.

Regarding db.Ping(). I want to clarify that this is a proper way to ensure a stable setup. Typically in a go you open the DB once (sql.Open()) and then never call db.Close(). So on startup the idiomatic approach in go would be the following:

db, err := sql.Open(driver, datasource)
if err != nil {
     return err
}
for db.Ping() != nil {
  time.Sleep(time.Second)
}

Does this seem like a sane approach?

Also, regarding bad connection. If I get that error, do I have to do db.Close() and the sql.Open() to recover? Again, that is not an expected flow in go.

from dqlite.

freeekanayaka avatar freeekanayaka commented on May 23, 2024

@freeekanayaka I'll try to send the data of a failing cluster. How can I send it to you? My email is [email protected] if you'd like to share your email.

[email protected]

Regarding db.Ping(). I want to clarify that this is a proper way to ensure a stable setup. Typically in a go you open the DB once (sql.Open()) and then never call db.Close(). So on startup the idiomatic approach in go would be the following:

db, err := sql.Open(driver, datasource)
if err != nil {
     return err
}
for db.Ping() != nil {
  time.Sleep(time.Second)
}

Does this seem like a sane approach?

It does seem sane to me, and it's virtually what we do in LXD.

Also, regarding bad connection. If I get that error, do I have to do db.Close() and the sql.Open() to recover? Again, that is not an expected flow in go.

No, you don't have too. If a driver returns ErrBadConn the sql package will internally invalidate that connection. All you need to do is to retry your query (e.g. db.Exec, tx.Exec or whatever). Think of this as the equivalent of the network link between your app and your PostgreSQL database going down (the pg Go driver will return ErrBadConn too in that case).

from dqlite.

ibuildthecloud avatar ibuildthecloud commented on May 23, 2024

I've changed my logic on start to just rely on db.Ping() now to test that things are stable. I also upgraded from v1.1.0 to v1.2.0 and latest libco, raft, sqlite as of now. It seems upgrading to the latest has fixed something because things now have come up on my broken cluster. I'll see how it goes. Typically the cluster crashes within a 1-3 days.

from dqlite.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.