I ran into an issue where I had flapping servers. I had three nodes and they were all

I've added a test in <a class="issue-link js-issue-link" data-error-text="Failed to lo

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

detect if cluster is stable? about dqlite HOT 7 CLOSED

canonical commented on May 23, 2024

detect if cluster is stable?

from dqlite.

Comments (7)

freeekanayaka commented on May 23, 2024

Ok, the last assertion failure means there is a bug in dqlite, where (presumably in consequence of some unexpected scenario) we close the SQLite db without cleaning up properly (e.g. closing prepared statements, or ongoing queries etc).

Running the dqlite server and client in the same process should be perfectly fine, and indeed it's what I'd expect people to normally do. Why do you think that's problematic? Of course the client must be able to retry, and not give up killing the entire process if it fails, if that's what you meant.

To detect that the cluster is stable, calling db.Ping() should be a good strategy for the setting initial process state. Why does that end up in a no-op for you? It should succeed only once the cluster has a leader.

All that being said, I think there is a bug where if you don't run queries like "CREATE IFNOT EXISTS my_table ...``` before queries like "SELECT * FROM my_table", then the SELECT might fail with a "table does not exist" error. But that probably doesn't apply to your situation, so I'm just mentioning it as a heads up.

from dqlite.

ibuildthecloud commented on May 23, 2024

db.Ping() just seemed to work even when things were not healthy and would result in the that assert failure above.

I currently have a cluster that is stuck with three nodes rebooting because they keep hitting Assertion failed: rc == 0 (src/leader.c: leader__close: 257) on start. Is there anything you would want to know to debug this?

from dqlite.

ibuildthecloud commented on May 23, 2024

Also, prior to to this cluster getting into this situation, two of the node were failing non-stop with bad connection error while doing the below sanity test

func ping(ctx context.Context, db *sql.DB) error {
	row, err := db.QueryContext(ctx, "SELECT 1")
	if err != nil {
		return err
	}
	defer row.Close()

	row.Next()
	return row.Err()
}

from dqlite.

freeekanayaka commented on May 23, 2024

I've added a test in canonical/go-dqlite#66 to confirm that db.Ping() returns an error if there's no leader, and returns no error if there's one (see the new TestIntegration_PingOnlyWorksOnceLeaderElected function). So that should be either a red herring or your particular scenario is somehow different than the test.

Note that db.Ping() makes use of connections internally cached by the db Go package, so if you call db.Ping() twice, it will return no error, even if the cluster lost the leader between those calls. That's confirmed by the unit test as well, not sure if that applies to you.

As per the bad connection error, that is a driver.ErrBadConn that is returned by the dqlite driver implementation mostly when there is no leader available. In that case the transaction should be retried by the client code, using some backoff.

Finally, the assertion failure seems to be the bug here. If you can reproduce it reliably, would it be possible to attach or send me a tarball with reproducing data and code or equivalent info?

from dqlite.

ibuildthecloud commented on May 23, 2024

@freeekanayaka I'll try to send the data of a failing cluster. How can I send it to you? My email is [email protected] if you'd like to share your email.

Regarding db.Ping(). I want to clarify that this is a proper way to ensure a stable setup. Typically in a go you open the DB once (sql.Open()) and then never call db.Close(). So on startup the idiomatic approach in go would be the following:

db, err := sql.Open(driver, datasource)
if err != nil {
     return err
}
for db.Ping() != nil {
  time.Sleep(time.Second)
}

Does this seem like a sane approach?

Also, regarding bad connection. If I get that error, do I have to do db.Close() and the sql.Open() to recover? Again, that is not an expected flow in go.

from dqlite.

freeekanayaka commented on May 23, 2024

@freeekanayaka I'll try to send the data of a failing cluster. How can I send it to you? My email is [email protected] if you'd like to share your email.

[email protected]

Regarding db.Ping(). I want to clarify that this is a proper way to ensure a stable setup. Typically in a go you open the DB once (sql.Open()) and then never call db.Close(). So on startup the idiomatic approach in go would be the following:
db, err := sql.Open(driver, datasource)
if err != nil {
     return err
}
for db.Ping() != nil {
  time.Sleep(time.Second)
}
Does this seem like a sane approach?

It does seem sane to me, and it's virtually what we do in LXD.

Also, regarding bad connection. If I get that error, do I have to do db.Close() and the sql.Open() to recover? Again, that is not an expected flow in go.

No, you don't have too. If a driver returns ErrBadConn the sql package will internally invalidate that connection. All you need to do is to retry your query (e.g. db.Exec, tx.Exec or whatever). Think of this as the equivalent of the network link between your app and your PostgreSQL database going down (the pg Go driver will return ErrBadConn too in that case).

from dqlite.

ibuildthecloud commented on May 23, 2024

I've changed my logic on start to just rely on db.Ping() now to test that things are stable. I also upgraded from v1.1.0 to v1.2.0 and latest libco, raft, sqlite as of now. It seems upgrading to the latest has fixed something because things now have come up on my broken cluster. I'll see how it goes. Typically the cluster crashes within a 1-3 days.

from dqlite.

detect if cluster is stable? about dqlite HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent