Comments (7)
Ok, the last assertion failure means there is a bug in dqlite, where (presumably in consequence of some unexpected scenario) we close the SQLite db without cleaning up properly (e.g. closing prepared statements, or ongoing queries etc).
Running the dqlite server and client in the same process should be perfectly fine, and indeed it's what I'd expect people to normally do. Why do you think that's problematic? Of course the client must be able to retry, and not give up killing the entire process if it fails, if that's what you meant.
To detect that the cluster is stable, calling db.Ping()
should be a good strategy for the setting initial process state. Why does that end up in a no-op for you? It should succeed only once the cluster has a leader.
All that being said, I think there is a bug where if you don't run queries like "CREATE IFNOT EXISTS my_table ...``` before queries like "SELECT * FROM my_table", then the SELECT might fail with a "table does not exist" error. But that probably doesn't apply to your situation, so I'm just mentioning it as a heads up.
from dqlite.
db.Ping()
just seemed to work even when things were not healthy and would result in the that assert
failure above.
I currently have a cluster that is stuck with three nodes rebooting because they keep hitting Assertion failed: rc == 0 (src/leader.c: leader__close: 257)
on start. Is there anything you would want to know to debug this?
from dqlite.
Also, prior to to this cluster getting into this situation, two of the node were failing non-stop with bad connection
error while doing the below sanity test
func ping(ctx context.Context, db *sql.DB) error {
row, err := db.QueryContext(ctx, "SELECT 1")
if err != nil {
return err
}
defer row.Close()
row.Next()
return row.Err()
}
from dqlite.
I've added a test in canonical/go-dqlite#66 to confirm that db.Ping()
returns an error if there's no leader, and returns no error if there's one (see the new TestIntegration_PingOnlyWorksOnceLeaderElected
function). So that should be either a red herring or your particular scenario is somehow different than the test.
Note that db.Ping()
makes use of connections internally cached by the db Go package, so if you call db.Ping()
twice, it will return no error, even if the cluster lost the leader between those calls. That's confirmed by the unit test as well, not sure if that applies to you.
As per the bad connection
error, that is a driver.ErrBadConn
that is returned by the dqlite driver implementation mostly when there is no leader available. In that case the transaction should be retried by the client code, using some backoff.
Finally, the assertion failure seems to be the bug here. If you can reproduce it reliably, would it be possible to attach or send me a tarball with reproducing data and code or equivalent info?
from dqlite.
@freeekanayaka I'll try to send the data of a failing cluster. How can I send it to you? My email is [email protected] if you'd like to share your email.
Regarding db.Ping(). I want to clarify that this is a proper way to ensure a stable setup. Typically in a go you open the DB once (sql.Open()
) and then never call db.Close()
. So on startup the idiomatic approach in go would be the following:
db, err := sql.Open(driver, datasource)
if err != nil {
return err
}
for db.Ping() != nil {
time.Sleep(time.Second)
}
Does this seem like a sane approach?
Also, regarding bad connection
. If I get that error, do I have to do db.Close()
and the sql.Open()
to recover? Again, that is not an expected flow in go.
from dqlite.
@freeekanayaka I'll try to send the data of a failing cluster. How can I send it to you? My email is [email protected] if you'd like to share your email.
Regarding db.Ping(). I want to clarify that this is a proper way to ensure a stable setup. Typically in a go you open the DB once (
sql.Open()
) and then never calldb.Close()
. So on startup the idiomatic approach in go would be the following:db, err := sql.Open(driver, datasource) if err != nil { return err } for db.Ping() != nil { time.Sleep(time.Second) }Does this seem like a sane approach?
It does seem sane to me, and it's virtually what we do in LXD.
Also, regarding
bad connection
. If I get that error, do I have to dodb.Close()
and thesql.Open()
to recover? Again, that is not an expected flow in go.
No, you don't have too. If a driver returns ErrBadConn
the sql
package will internally invalidate that connection. All you need to do is to retry your query (e.g. db.Exec
, tx.Exec
or whatever). Think of this as the equivalent of the network link between your app and your PostgreSQL database going down (the pg Go driver will return ErrBadConn
too in that case).
from dqlite.
I've changed my logic on start to just rely on db.Ping() now to test that things are stable. I also upgraded from v1.1.0 to v1.2.0 and latest libco, raft, sqlite as of now. It seems upgrading to the latest has fixed something because things now have come up on my broken cluster. I'll see how it goes. Typically the cluster crashes within a 1-3 days.
from dqlite.
Related Issues (20)
- Moving sqlite3_step and other database operations off the main thread HOT 8
- Expose option to disable/enable raft snapshot compression
- Handle INTERRUPT request HOT 3
- stderr of server threads swallowed during integration tests HOT 2
- Proposal: stop trying to handle OOM HOT 1
- Investigate growing memory usage found by microk8s benchmarking HOT 2
- Consider shipping a tiny "manifest" binary to print information about the dqlite installation
- install instructions don't work HOT 4
- Confusing error message when trying to run a query statement with Exec HOT 1
- Recommended way to perform schema migrations HOT 4
- Raft uv_timer leak when creating and destroying node HOT 1
- Cluster-wide configuration of target voter/standby count
- Idea: pass a socket instead of dqlite_node_set_bind_address HOT 4
- Implement DQLITE_VISIBLE_TO_TESTS properly, or get rid of it HOT 1
- Git layout for v1.15.0 HOT 8
- Write operations that immediately follow write operations sometimes cause a disk I/O-error, followed by loss of leadership and high latency HOT 3
- Support the RETURNING clause HOT 2
- Can we use the unix-excl VFS? HOT 9
- Consider periodically using VACUUM to reduce memory footprint HOT 9
- Not Leader failure response HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dqlite.