Code Monkey home page Code Monkey logo

dqlite's Introduction

dqlite CI Tests codecov

English|简体中文

dqlite is a C library that implements an embeddable and replicated SQL database engine with high availability and automatic failover.

The acronym "dqlite" stands for "distributed SQLite", meaning that dqlite extends SQLite with a network protocol that can connect together various instances of your application and have them act as a highly-available cluster, with no dependency on external databases.

Design highlights

  • Asynchronous single-threaded implementation using libuv as event loop.
  • Custom wire protocol optimized for SQLite primitives and data types.
  • Data replication based on the Raft algorithm.

License

The dqlite library is released under a slightly modified version of LGPLv3, that includes a copyright exception allowing users to statically link the library code in their project and release the final work under their own terms. See the full license text.

Compatibility

dqlite runs on Linux and requires a kernel with support for native async I/O (not to be confused with POSIX AIO).

Try it

The simplest way to see dqlite in action is to use the demo program that comes with the Go dqlite bindings. Please see the relevant documentation in that project.

Media

A talk about dqlite was given at FOSDEM 2020, you can watch it here.

Here is a blog post from 2022 comparing dqlite with rqlite and Litestream, other replication software for SQLite.

Wire protocol

If you wish to write a client, please refer to the wire protocol documentation.

Install

If you are on a Debian-based system, you can get the latest development release from dqlite's dev PPA:

sudo add-apt-repository ppa:dqlite/dev
sudo apt update
sudo apt install libdqlite-dev

Contributing

See CONTRIBUTING.md.

Build

To build libdqlite from source you'll need:

  • Build dependencies: pkg-config and GNU Autoconf, Automake, libtool, and make
  • A reasonably recent version of libuv (v1.8.0 or later), with headers.
  • A reasonably recent version of SQLite (v3.22.0 or later), with headers.
  • Optionally, a reasonably recent version of LZ4 (v1.7.1 or later), with headers.

Your distribution should already provide you with these dependencies. For example, on Debian-based distros:

sudo apt install pkg-config autoconf automake libtool make libuv1-dev libsqlite3-dev liblz4-dev

With these dependencies installed, you can build and install the dqlite shared library and headers as follows:

$ autoreconf -i
$ ./configure --enable-build-raft
$ make
$ sudo make install

The default installation prefix is /usr/local; you may need to run

$ sudo ldconfig

to enable the linker to find libdqlite.so. To install to a different prefix, replace the configure step with something like

$ ./configure --enable-build-raft --prefix=/usr

The --enable-build-raft option causes dqlite to use its bundled Raft implementation instead of linking to an external libraft; the latter is a legacy configuration that should not be used for new development.

Usage Notes

Detailed tracing will be enabled when the environment variable LIBDQLITE_TRACE is set before startup. The value of it can be in [0..5] range and reperesents a tracing level, where 0 means "no traces" emitted, 5 enables minimum (FATAL records only), and 1 enables maximum verbosity (all: DEBUG, INFO, WARN, ERROR, FATAL records).

dqlite's People

Contributors

cameronnemo avatar cnnrznn avatar cole-miller avatar cyphar avatar dependabot[bot] avatar ericcurtin avatar fitojb avatar frankh avatar freeekanayaka avatar growdu avatar hamberfim avatar imranzaheer612 avatar just-now avatar kunzef avatar mathieubordere avatar mjeanson avatar muir avatar nanjj avatar nsg avatar paulstuart avatar rabits avatar ruakij avatar shade34321 avatar sixcorners avatar stgraber avatar tomponline avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dqlite's Issues

demo: sqlite3.c:58213: sqlite3WalExclusiveMode: Assertion `pWal->writeLock==0' failed.

When the demo program is run it occasionally crashes with the error above.

This is a variation of #2, but harder to reproduce: it gets triggered by the same condition (leader stepping back), but it happens only if the leader has not yet tried to apply the WalFrames command at all.

To make it reproducible one needs to slow down the transaction, and increase the sleep between inserts, to have a chance to kill the last follower before COMMIT is issued. For instance applying this patch to demo.go:

modified   testdata/demo.go
@@ -114,7 +114,8 @@ func insertForever(db *sql.DB) {
 		// Insert a batch of rows.
 		offset := insertedCount(tx)
 		failed := false
-		for i := 0; i < 50; i++ {
+		for i := 0; i < 5; i++ {
+			log.Printf("[INFO] demo: inserting %d", i)
 			if _, err := tx.Exec("INSERT INTO test (n) VALUES(?)", i+offset); err != nil {
 				handleTxError(tx, err)
 				failed = true
@@ -122,7 +123,7 @@ func insertForever(db *sql.DB) {
 			}
 			// Sleep a tiny bit between each insert to make it more likely that we get
 			// terminated in the middle of a transaction.
-			randomSleep(0.010, 0.025)
+			randomSleep(1, 2)
 		}
 		if failed {
 			continue

and then killing the last follower while the leader transaction is in progress, i.e. the leader is logging "[INFO] demo: inserting N" with 0 <= N <= 3. This triggers the bug.

The fix is the same as for #2.

Sample output of the crash:

10.204.119.128:9980: 2017/07/12 21:10:18 [INFO] demo: 8400 rows inserted, 0 values missing
10.204.119.128:9980: 2017/07/12 21:10:19 [INFO] demo: 8450 rows inserted, 0 values missing
10.204.119.128:9980: 2017/07/12 21:10:21 [INFO] demo: 8500 rows inserted, 0 values missing
10.204.119.128:9980: 2017/07/12 21:10:21 [ERR] raft: peer 10.204.119.99:9980 has newer term,
stopping replication
10.204.119.128:9980: 2017/07/12 21:10:21 [INFO] raft: Node at 10.204.119.128:9980 [Follower]
entering Follower state (Leader: "")
10.204.119.128:9980: 2017/07/12 21:10:21 [INFO] raft: aborting pipeline replication to peer
10.204.119.242:9980
10.204.119.128:9980: 2017/07/12 21:10:22 [INFO] dqlite: methods: attempted wal frames method
on deposed leader
10.204.119.128:9980: 2017/07/12 21:10:22 [INFO] dqlite: methods: attempted undo method on
deposed leader
10.204.119.128:9980: 2017/07/12 21:10:22 [INFO] dqlite: methods: attempted end method on
deposed leader
demo: sqlite3.c:58213: sqlite3WalExclusiveMode: Assertion `pWal->writeLock==0' failed.
SIGABRT: abort
PC=0x7f6572c4377f m=5 sigcode=18446744073709551610
signal arrived during cgo execution

goroutine 35 [syscall, locked to thread]:
runtime.cgocall(0x7b2592, 0xc42038bc60, 0xc42038bc88)
    /usr/lib/go-1.8/src/runtime/cgocall.go:131 +0xe2 fp=0xc42038bc30 sp=0xc42038bbf0
github.com/mattn/go-sqlite3._Cfunc__sqlite3_step(0x7f656004d360, 0xc4201e5710, 0xc4201e5718,
0x0)
    github.com/mattn/go-sqlite3/_obj/_cgo_gotypes.go:240 +0x4d fp=0xc42038bc60
sp=0xc42038bc30
github.com/mattn/go-sqlite3.(*SQLiteStmt).exec.func4(0x7f656004d360, 0xc4201e5710,
0xc4201e5718, 0xc42006e190)
    /root/go/src/github.com/mattn/go-sqlite3/sqlite3.go:966 +0x74 fp=0xc42038bc98
sp=0xc42038bc60
github.com/mattn/go-sqlite3.(*SQLiteStmt).exec(0xc4203652f0, 0x7f65736ab4e0, 0xc42006e190,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
    /root/go/src/github.com/mattn/go-sqlite3/sqlite3.go:966 +0x19b fp=0xc42038bd10
sp=0xc42038bc98
github.com/mattn/go-sqlite3.(*SQLiteConn).exec(0xc420240e40, 0x7f65736ab4e0, 0xc42006e190,
0x8a259b, 0x6, 0x0, 0x0, 0x0, 0xb3a304b73d3, 0xc42038be68, ...)
    /root/go/src/github.com/mattn/go-sqlite3/sqlite3.go:503 +0x2f9 fp=0xc42038bdf8
sp=0xc42038bd10
github.com/mattn/go-sqlite3.(*SQLiteTx).Commit(0xc4201ce110, 0xc42038beb0, 0xc42038bea8)
    /root/go/src/github.com/mattn/go-sqlite3/sqlite3.go:311 +0xa1 fp=0xc42038be78
sp=0xc42038bdf8
database/sql.(*Tx).Commit.func1()
    /usr/lib/go-1.8/src/database/sql/sql.go:1509 +0x3c fp=0xc42038bea8 sp=0xc42038be78
database/sql.withLock(0xc9bf60, 0xc42021e1c0, 0xc42038bf08)
    /usr/lib/go-1.8/src/database/sql/sql.go:2545 +0x65 fp=0xc42038bed0 sp=0xc42038bea8
database/sql.(*Tx).Commit(0xc420228000, 0x3f9999999999999a, 0x1e)
    /usr/lib/go-1.8/src/database/sql/sql.go:1510 +0xdd fp=0xc42038bf30 sp=0xc42038bed0
main.insertForever(0xc42009ed20, 0xc42006d1a0)
    /root/go/src/github.com/dqlite/dqlite/testdata/demo.go:135 +0x251 fp=0xc42038bfd0
sp=0xc42038bf30
runtime.goexit()
    /usr/lib/go-1.8/src/runtime/asm_amd64.s:2197 +0x1 fp=0xc42038bfd8 sp=0xc42038bfd0
created by main.main
    /root/go/src/github.com/dqlite/dqlite/testdata/demo.go:93 +0x7cf

goroutine 1 [chan receive, 1 minutes]:
main.main()
    /root/go/src/github.com/dqlite/dqlite/testdata/demo.go:94 +0x7f5

goroutine 17 [syscall, 1 minutes, locked to thread]:
runtime.goexit()
    /usr/lib/go-1.8/src/runtime/asm_amd64.s:2197 +0x1

goroutine 20 [syscall, 1 minutes]:
os/signal.signal_recv(0x6dfacb)
    /usr/lib/go-1.8/src/runtime/sigqueue.go:116 +0x104
os/signal.loop()
    /usr/lib/go-1.8/src/os/signal/signal_unix.go:22 +0x22
created by os/signal.init.1
    /usr/lib/go-1.8/src/os/signal/signal_unix.go:28 +0x41

goroutine 21 [IO wait]:
net.runtime_pollWait(0x7f65736ab1e8, 0x72, 0x0)
    /usr/lib/go-1.8/src/runtime/netpoll.go:164 +0x59
net.(*pollDesc).wait(0xc4200f61b8, 0x72, 0x0, 0xc42000d4a0)
    /usr/lib/go-1.8/src/net/fd_poll_runtime.go:75 +0x38
net.(*pollDesc).waitRead(0xc4200f61b8, 0xffffffffffffffff, 0x0)
    /usr/lib/go-1.8/src/net/fd_poll_runtime.go:80 +0x34
net.(*netFD).accept(0xc4200f6150, 0x0, 0xc9a1a0, 0xc42000d4a0)
    /usr/lib/go-1.8/src/net/fd_unix.go:430 +0x1e5
net.(*TCPListener).accept(0xc42007e0e8, 0xc4200236d8, 0x67d8ae, 0x456790)
    /usr/lib/go-1.8/src/net/tcpsock_posix.go:136 +0x2e
net.(*TCPListener).Accept(0xc42007e0e8, 0x8bfd00, 0xc42009ebe0, 0xc9f020, 0xc42012a0c0)
    /usr/lib/go-1.8/src/net/tcpsock.go:228 +0x49
net/http.(*Server).Serve(0xc4200acd10, 0xc9e520, 0xc42007e0e8, 0x0, 0x0)
    /usr/lib/go-1.8/src/net/http/server.go:2643 +0x228
created by main.main
    /root/go/src/github.com/dqlite/dqlite/testdata/demo.go:62 +0x447

goroutine 22 [chan receive]:
github.com/dqlite/raft-http.(*Layer).Accept(0xc420124180, 0x8bf868, 0xc4200f61c0, 0xca1840,
0xc4201bc198)
    /root/go/src/github.com/dqlite/raft-http/layer.go:39 +0x5e
github.com/hashicorp/raft.(*NetworkTransport).listen(0xc4200f61c0)
    /root/go/src/github.com/hashicorp/raft/net_transport.go:362 +0x49
created by github.com/hashicorp/raft.NewNetworkTransportWithLogger
    /root/go/src/github.com/hashicorp/raft/net_transport.go:154 +0x192

goroutine 30 [select]:
github.com/hashicorp/raft.(*Raft).runFollower(0xc420146600)
    /root/go/src/github.com/hashicorp/raft/raft.go:646 +0xa1a
github.com/hashicorp/raft.(*Raft).run(0xc420146600)
    /root/go/src/github.com/hashicorp/raft/raft.go:630 +0xa4
github.com/hashicorp/raft.(*Raft).(github.com/hashicorp/raft.run)-fm()
    /root/go/src/github.com/hashicorp/raft/raft.go:258 +0x2a
github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc420146600, 0xc42018a050)
    /root/go/src/github.com/hashicorp/raft/state.go:142 +0x53
created by github.com/hashicorp/raft.(*raftState).goFunc
    /root/go/src/github.com/hashicorp/raft/state.go:143 +0x66

goroutine 31 [select]:
github.com/hashicorp/raft.(*Raft).runFSM(0xc420146600)
    /root/go/src/github.com/hashicorp/raft/raft.go:541 +0xdb8
github.com/hashicorp/raft.(*Raft).(github.com/hashicorp/raft.runFSM)-fm()
    /root/go/src/github.com/hashicorp/raft/raft.go:259 +0x2a
github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc420146600, 0xc42018a060)
    /root/go/src/github.com/hashicorp/raft/state.go:142 +0x53
created by github.com/hashicorp/raft.(*raftState).goFunc
    /root/go/src/github.com/hashicorp/raft/state.go:143 +0x66

goroutine 32 [select]:
github.com/hashicorp/raft.(*Raft).runSnapshots(0xc420146600)
    /root/go/src/github.com/hashicorp/raft/raft.go:1744 +0x367
github.com/hashicorp/raft.(*Raft).(github.com/hashicorp/raft.runSnapshots)-fm()
    /root/go/src/github.com/hashicorp/raft/raft.go:260 +0x2a
github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc420146600, 0xc42018a070)
    /root/go/src/github.com/hashicorp/raft/state.go:142 +0x53
created by github.com/hashicorp/raft.(*raftState).goFunc
    /root/go/src/github.com/hashicorp/raft/state.go:143 +0x66

goroutine 33 [chan receive, 1 minutes]:
github.com/dqlite/raft-membership.HandleChangeRequests(0xc420146600, 0xc42006c300)
    /root/go/src/github.com/dqlite/raft-membership/handle.go:27 +0x51
created by github.com/dqlite/dqlite.NewDriver
    /root/go/src/github.com/dqlite/dqlite/driver.go:81 +0x573

goroutine 34 [chan receive, 1 minutes]:
database/sql.(*DB).connectionOpener(0xc42009ed20)
    /usr/lib/go-1.8/src/database/sql/sql.go:837 +0x4a
created by database/sql.Open
    /usr/lib/go-1.8/src/database/sql/sql.go:582 +0x212

goroutine 810 [select]:
github.com/mattn/go-sqlite3.(*SQLiteStmt).exec.func3(0x7f65736ab4e0, 0xc42006e190,
0xc4203ff6e0, 0x7f6560058d40)
    /root/go/src/github.com/mattn/go-sqlite3/sqlite3.go:958 +0x115
created by github.com/mattn/go-sqlite3.(*SQLiteStmt).exec
    /root/go/src/github.com/mattn/go-sqlite3/sqlite3.go:963 +0x138

goroutine 755 [chan receive]:
database/sql.(*Tx).awaitDone(0xc420228000)
    /usr/lib/go-1.8/src/database/sql/sql.go:1440 +0x57
created by database/sql.(*DB).begin
    /usr/lib/go-1.8/src/database/sql/sql.go:1383 +0x274

goroutine 672 [IO wait]:
net.runtime_pollWait(0x7f65736aafa8, 0x72, 0x10)
    /usr/lib/go-1.8/src/runtime/netpoll.go:164 +0x59
net.(*pollDesc).wait(0xc4203aeae8, 0x72, 0xc9b520, 0xc97628)
    /usr/lib/go-1.8/src/net/fd_poll_runtime.go:75 +0x38
net.(*pollDesc).waitRead(0xc4203aeae8, 0xc4201e1000, 0x1000)
    /usr/lib/go-1.8/src/net/fd_poll_runtime.go:80 +0x34
net.(*netFD).Read(0xc4203aea80, 0xc4201e1000, 0x1000, 0x1000, 0x0, 0xc9b520, 0xc97628)
    /usr/lib/go-1.8/src/net/fd_unix.go:250 +0x1b7
net.(*conn).Read(0xc4201bc198, 0xc4201e1000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    /usr/lib/go-1.8/src/net/net.go:181 +0x70
bufio.(*Reader).fill(0xc420284900)
    /usr/lib/go-1.8/src/bufio/bufio.go:97 +0x117
bufio.(*Reader).ReadByte(0xc420284900, 0x300000002, 0xc4201a7a00, 0xc42003bca0)
    /usr/lib/go-1.8/src/bufio/bufio.go:239 +0x5b
github.com/hashicorp/raft.(*NetworkTransport).handleCommand(0xc4200f61c0, 0xc420284900,
0xc420284960, 0xc4200f7420, 0x0, 0x0)
    /root/go/src/github.com/hashicorp/raft/net_transport.go:402 +0x43
github.com/hashicorp/raft.(*NetworkTransport).handleConn(0xc4200f61c0, 0xca1840,
0xc4201bc198)
    /root/go/src/github.com/hashicorp/raft/net_transport.go:386 +0x221
created by github.com/hashicorp/raft.(*NetworkTransport).listen
    /root/go/src/github.com/hashicorp/raft/net_transport.go:373 +0x1d0

rax    0x0
rbx    0x7f657374a000
rcx    0x7f6572c4377f
rdx    0x0
rdi    0x2
rsi    0x7f6571205210
rbp    0x7f65732eb8ea
rsp    0x7f6571205288
r8     0x0
r9     0x7f6571205210
r10    0x8
r11    0x246
r12    0xe365
r13    0x7f65733088d0
r14    0x200
r15    0x55
rip    0x7f6572c4377f
rflags 0x246
cs     0x33
fs     0x0
gs     0x0

question: sqlite errors

When running go-dqlite database/sql driver, if an error is returned from the driver, will it be an instance of sqlite3.Error with all the same codes as the mattn/go-sqlite3 driver?

Is dqlite really a single-writer DB?

From the FAQ "How does dqlite behave during conflict situations?", it seems that all write operations have to be performed in the master node, and therefore the non-master nodes can only make read-queries. This would require that the "user code" on non-master nodes somehow record all desired changes, and use some other networking stack to send those changes to the master to be executed. This seems very complicated and impractical. Is this correct? Could this be more precisely explained in the FAQ?

panic in sqlite

I keep getting this panic. In this situation this code path is not directly dqlite. In k3s I'm supporting both sqlite and dqlite, so when I run the old sqlite code but with the patched sqlite library it will randomly fail in this same place

goroutine 77265 [syscall]:
runtime.cgocall(0x3416b20, 0xc00d447388, 0x0)
        /usr/local/go/src/runtime/cgocall.go:128 +0x5b fp=0xc00d447358 sp=0xc00d447320 pc=0x40472b
github.com/rancher/k3s/vendor/github.com/mattn/go-sqlite3._Cfunc_sqlite3_close_v2(0x9144e00, 0x0)
        _cgo_gotypes.go:607 +0x49 fp=0xc00d447388 sp=0xc00d447358 pc=0xf282b9
github.com/rancher/k3s/vendor/github.com/mattn/go-sqlite3.(*SQLiteConn).Close.func1(0xc01717a5a0, 0x0)
        /go/src/github.com/rancher/k3s/vendor/github.com/mattn/go-sqlite3/sqlite3.go:1646 +0x5f fp=0xc00d4473c8 sp=0xc00d447388 pc=0xf3edff
github.com/rancher/k3s/vendor/github.com/mattn/go-sqlite3.(*SQLiteConn).Close(0xc01717a5a0, 0x8, 0xc0137dd6c0)
        /go/src/github.com/rancher/k3s/vendor/github.com/mattn/go-sqlite3/sqlite3.go:1646 +0x2f fp=0xc00d4473f8 sp=0xc00d4473c8 pc=0xf37f9f
database/sql.(*driverConn).finalClose.func2()
        /usr/local/go/src/database/sql/sql.go:521 +0x49 fp=0xc00d447430 sp=0xc00d4473f8 pc=0xf194b9
database/sql.withLock(0x486c400, 0xc016a67a80, 0xc00d4474c8)
        /usr/local/go/src/database/sql/sql.go:3097 +0x63 fp=0xc00d447458 sp=0xc00d447430 pc=0xf19133
database/sql.(*driverConn).finalClose(0xc016a67a80, 0x3e18580, 0x7fb7fc6d4908)
        /usr/local/go/src/database/sql/sql.go:519 +0x130 fp=0xc00d4474f0 sp=0xc00d447458 pc=0xf0c9d0
database/sql.finalCloser.finalClose-fm(0xc0007d8560, 0x4823b60)
        /usr/local/go/src/database/sql/sql.go:565 +0x2f fp=0xc00d447518 sp=0xc00d4474f0 pc=0xf1bb9f
database/sql.(*driverConn).Close(0xc016a67a80, 0xc016a67a80, 0x0)
        /usr/local/go/src/database/sql/sql.go:500 +0x138 fp=0xc00d447568 sp=0xc00d447518 pc=0xf0c878
database/sql.(*DB).putConn(0xc0007d8540, 0xc016a67a80, 0x0, 0x0, 0xc0000a0200)
        /usr/local/go/src/database/sql/sql.go:1277 +0x1c8 fp=0xc00d4475d8 sp=0xc00d447568 pc=0xf101c8
database/sql.(*driverConn).releaseConn(...)
        /usr/local/go/src/database/sql/sql.go:421
database/sql.(*driverConn).releaseConn-fm(0x0, 0x0)
        /usr/local/go/src/database/sql/sql.go:420 +0x4c fp=0xc00d447610 sp=0xc00d4475d8 pc=0xf1bc1c
database/sql.(*Rows).close(0xc01464ca80, 0x0, 0x0, 0x0, 0x0)
        /usr/local/go/src/database/sql/sql.go:3001 +0x15a fp=0xc00d447660 sp=0xc00d447610 pc=0xf18c0a
database/sql.(*Rows).Close(...)
        /usr/local/go/src/database/sql/sql.go:2972
database/sql.(*Rows).Next(0xc01464ca80, 0x0)
        /usr/local/go/src/database/sql/sql.go:2661 +0xb9 fp=0xc00d4476c0 sp=0xc00d447660 pc=0xf17389
github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured/sqllog.RowsToEvents(0xc01464ca80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /go/src/github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured/sqllog/sql.go:221 +0xd9 fp=0xc00d447740 sp=0xc00d4476c0 pc=0x10143b9

Assertion failed: rc == 0 (src/leader.c: leader__close: 257)

On startup/restart of a failed node I randomly get this assertion and then the node will go into a restart loop always failing with this assertion. I'm going to see if I can put together something reproducible for this issue, but I was first curious if there was any theories as to when/how this could happen?

rolloback failed sql: Transaction has already been committed or rolled back

When the demo program is run it occasionally crashes with the error above.

To reproduce:

  • Start three demo nodes
  • Kill a follower
  • Kill the second follower while the leader is busy in the insert transaction

Debugging shows that it happens because the leader steps back (as it has lost quorum), and its WalFrames command to commit the transaction fails, returning an error to SQLite's COMMIT command, which internally automatically tries to rollback the WAL. In turn, the error handling code in the demo program does not differentiate between begin/insert errors and commit ones, and unconditionally tries run tx.Rollback after the COMMIT error, resulting in the SQLite error of the title.

Sample output:

10.204.119.99:9980: 2017/07/12 21:06:58 [INFO] demo: 1950 rows inserted, 0 values missing
10.204.119.99:9980: 2017/07/12 21:06:59 [INFO] demo: 2000 rows inserted, 0 values missing
10.204.119.99:9980: 2017/07/12 21:07:00 [INFO] demo: 2050 rows inserted, 0 values missing
10.204.119.99:9980: 2017/07/12 21:07:01 [INFO] demo: 2100 rows inserted, 0 values missing
10.204.119.99:9980: 2017/07/12 21:07:03 [INFO] demo: 2150 rows inserted, 0 values missing
10.204.119.99:9980: 2017/07/12 21:07:04 [ERR] raft: peer 10.204.119.128:9980 has newer term,
stopping replication
10.204.119.99:9980: 2017/07/12 21:07:04 [INFO] raft: Node at 10.204.119.99:9980 [Follower]
entering Follower state (Leader: "")
10.204.119.99:9980: 2017/07/12 21:07:04 [ERR] dqlite: methods: failed to apply WAL frames
{txid=59e7f215-326b-479b-ba27-3609937ecd9c page-size=4096 pages=1 truncate=7 is-end=1
sync-flags=34} command: leadership lost while committing log
10.204.119.99:9980: 2017/07/12 21:07:04 [INFO] raft: aborting pipeline replication to peer
10.204.119.242:9980
10.204.119.99:9980: 2017/07/12 21:07:04 [ERR] demo: rolloback failed sql: Transaction has
already been committed or rolled back

packaging dqlite in linux distribution / static linking to sqlite3 fork

Current way of building dqlite is highly problematic because it needs sqlite3 fork.

Installing/packaging sqlite3 fork is a problem on distribution because it conflicts with regular sqlite3. Packaging fork would mean patching it to rename libraries to some other name (libsqlite3dqlite.so maybe).

Other solutions are:
a) merge changes into upstream sqlite3 (I assume it wasn't done because upstream doesn't want it)
b) change configure system in dqlite to be able to use bundled sqlite3 fork and statically link it (and only it, leave rest shared). Static linking is not a great solution (security issues in sqlite etc) but well...

Any other ideas how to make dqlite "packagable" for any linux distro?

Suspected int overflow or encoding issue

Running dqlite on a Raspberry PI (armhf 32bit) crashes almost immediately with the following error (sorry it's a screen shot)

The line is failing on is go-dqlite is

	size := m.getUint64()
	data := make([]byte, size)

Just some wild guess would be a uint32 got encoded on the stream and now decoded as a uint64 in go.

This is was reported in k3s k3s-io/k3s#1155.

The way that dqlite is compiled is here https://github.com/rancher/dqlite-build/blob/1e83a1fa426219abbba572b406e8acd2749e8b5a/Dockerfile#L57. One thing that could be causing this issue is that we compile armhf on a arm64 host. The user space and compiler are all compiled for armhf but it is possible some platform detection logic is querying the CPU/kernel and not the build environment.

Lost node due to raft issue

I've been testing failure scenarios and I have a node that can't start because it gets the following error on startup:

raft_start(): io: load closed segment 0000000010000470-0000000010000655: 

All I've been doing is randomly killing the server. Let me know if you want the data files. I emailed the data files for this node.

Implement two phase commit

When leadership is lost while applying a WalFrames command with commit=1, the Methods object has no way to tell if the log entry actually got committed or not. We should investigate implementing an additional command for two-phase commit, so we can be sure that the commit went through or not. See the TestIntegration_MethodHookLeadershipLost#wal frames sub-test.

Works on Xenial, fails on Bionic

I'm running into issues with dqlite (actually dqlite-demo but I believe it's a dqlite issue at the core).

This is using the packages from ppa:dqlite/stable, as well as building from source (latest versions).

It appears to fail during the capabilities probe:

openat(AT_FDCWD, "/tmp/dqlite-demo/1/.probe-2ZN7v3", O_RDWR|O_CREAT|O_EXCL, 0600) = 9 fallocate(9, 0, 0, 4096) = 0 unlink("/tmp/dqlite-demo/1/.probe-2ZN7v3") = 0 fcntl(9, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE) fcntl(9, F_SETFL, O_RDWR|O_DIRECT|O_LARGEFILE) = 0 write(9, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 io_setup(1, [0x7fc25ef7d000]) = 0 io_submit(0x7fc25ef7d000, 1, [{aio_lio_opcode=IOCB_CMD_PWRITE, aio_fildes=9, aio_buf="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., aio_nbytes=4096, aio_offset=0}]) = -1 EINVAL (Invalid argument) io_destroy(0x7fc25ef7d000) = 0

Testing is in simple docker containers that vary only in the release used for the image base (ubuntu:xenial vs. ubuntu:bionic)

Node ID overflow/truncate

The node ID in go-dqlite is uint64 but it seems if you use a very large number the ID later returned in the client.Cluster() call is different.

Example:

$ dqlite-demo start 1 &
# dqlite-demo start 1210777256789200816 -a 127.0.0.1:9182 &
$ dqlite-demo add 1210777256789200816 -a 127.0.0.1:9182
$ dqlite-demo cluster
ID 	Leader 	Address
1 	true 	127.0.0.1:9181
84594608 	false 	127.0.0.1:9182

I believe 84594608 is 1210777256789200816 if you set the first 32 bytes to 0.

Add fuzzing

SQLite has fuzzing integrated into it. See https://www.sqlite.org/src/artifact/ad79e867fb504338
I think DQLite would also benefit from fuzzing. Is there any interest in adding this to the project?

If so, would OSS-Fuzz be a candidate fuzzing tool?

Just looking to start the discussion if there is interest in adding fuzzing to this project

Cross platform support

In the readme it says that the Linux AIO API is used for disk I/O, which prevents this library from being cross platform. However, if the library already uses libuv, is there a reason why libuv isn't used as an abstraction for disk I/O?

Help using the library?

I'm writing a rust crate with bindings to dqlite. In theory that should be very easy given the small surface area of this library.

However, I keep getting DQLITE_MISUSE errors trying to bind to an address. I'm just passing "127.0.0.1:5005" as soon as I create the node (I didn't try starting it or anything).

This is the current minimal state of the project: https://github.com/jeromegn/libdqlite-sys/blob/master/src/lib.rs

See the e_to_e for an example on how I'm expecting it to be used.

What am I doing wrong?

It's unclear if libdqlite expects NUL-byte terminated strings. I tried both but kept getting the same error code so I figure it must be something else.

Edit: Our use case will be replicating state across many different servers.

SIGSEGV on Raft initial configuration

Hello,

I am not sure if this is the right project to submit the issue for, but here it goes: I tried to install LXD using dqlite and a replication enabled sqlite. Unfortunately, running the daemon results in a segfault: http://ix.io/1saK

dqlite was built using a standard autoreconf / configure / make.

sqlite-replication was built with the following configure arguments:

--enable-replication --enable-threadsafe --enable-dynamic-extensions --enable-fts5

and the following CFLAGS:

-DSQLITE_ENABLE_DBSTAT_VTAB
-DSQLITE_ENABLE_COLUMN_METADATA
-DSQLITE_ENABLE_UNLOCK_NOTIFY
-DSQLITE_SECURE_DELETE
-DSQLITE_ENABLE_JSON1
-DSQLITE_ENABLE_FTS3
-DSQLITE_ENABLE_FTS3_PARENTHESIS
-DSQLITE_ENABLE_FTS4
-DSQLITE_ENABLE_FTS3_TOKENIZER=1
-DSQLITE_ENABLE_BATCH_ATOMIC_WRITE=1
-DHAVE_FDATASYNC

Please let me know if there is anything I can do to get more information about this segfault.

Some background on the host:

  • distro: Void Linux
  • libedit: 20180525.3.1_2
  • libuv: 1.24.0
  • sqlite-replication: 3.25.3
  • dqlite: 0.2.5
  • lxd: tried on both 3.7 and 3.0.2

Feature request: load an existing database into the cluster

It seems that dqlite currently has no way of loading an existing database into a cluster. A key use case for this would be disaster recovery where one has a dump of the database.

Proposal: modify client protocol #3 (below) to take a an optional file entry and restore that into the named database. I have done a PoC for this (using absolute file names for DB to trigger the restore). @freeekanayaka would be willing to do the coding if you're ok with the proposal.

3 - Open a database (current)

Type Value
text The name of the database
uint64 Currently unused
text Currently unused

3 - Open a database (proposed)

Type Value
text The name of the database
file Optional existing database to load

failed to set bind address

I killed a node and then tried to start it again and it failed with "failed to set bind address." I'm using abstract unix sockets ("@....") for the bind address. I'm not a C programmer, so forgive me if this is way off base, but is it possible when the bind address is set the FD is not set to FD_CLOEXEC? My process that runs dqlite also spawns children and I suspected that a child process was holding onto the unix socket. After killing all orphan children it released the socket and I was able to start again.

Wrong url in FAQ on dqlite.io

In FAQ at dqlite.io in 'Who’s using dqlite?' section there's a link to clustering which doesn't work.

It links to https://github.com/lxc/lxd/blob/master/doc/clustering instead of https://github.com/lxc/lxd/blob/master/doc/clustering.md.

panic: restore failure: database 'test.db' has 1 leader connections

This crash happens occasionally in the demo program. It happens when a follower has been down long enough to fall behind so much with applied raft logs that the leader no longer has the individual logs that the follower would need to catch up (the amount of trailing logs that a nodes keep is set by the raft.Raft.TrailingLogs parameter). In this case the leader will send to the follower its last snapshot, instead of the individual missing logs, and the follower it restore them using replication.FSM.restoreDatabase(). However restoreDatabase() checks for open leader connection and finds one because the demo code in the follower has already reached the tx, err := db.Begin() line in the insert loop, and is blocked on it.

To reproduce easily, apply a patch like:

modified   raft.go
@@ -39,9 +39,9 @@ func newRaft(config *Config, join string, fsm *replication.FSM, logger *log.Logg
 		MaxAppendEntries:           64,
 		ShutdownOnRemove:           true,
 		DisableBootstrapAfterElect: true,
-		TrailingLogs:               256,
+		TrailingLogs:               10,
 		SnapshotInterval:           500 * time.Millisecond,
-		SnapshotThreshold:          64,
+		SnapshotThreshold:          5,
 		LeaderLeaseTimeout:         config.LeaderLeaseTimeout,
 		EnableSingleNode:           enableSingleNode,
 		Logger:                     logger,

and just stop a follower long enough that it falls behind at least 10 logs. When it restarts, it will crash with the panic reported in the issue title.

Sample output:

10.204.119.242:9980: 2017/07/12 21:32:02 [INFO] demo: start
10.204.119.242:9980: 2017/07/12 21:32:02 [DEBUG] dqlite: restore snapshot
10.204.119.242:9980: 2017/07/12 21:32:02 [DEBUG] dqlite: snapshot database size 385024
10.204.119.242:9980: 2017/07/12 21:32:02 [DEBUG] dqlite: snapshot wal size 391432
10.204.119.242:9980: 2017/07/12 21:32:02 [DEBUG] dqlite: snapshot txid 
10.204.119.242:9980: 2017/07/12 21:32:02 [ERR] sqlite: recovered 95 frames from WAL file
/root/data/test.db-wal (283)
10.204.119.242:9980: 2017/07/12 21:32:02 [INFO] raft: Restored from snapshot
40-2432-1499894688444
10.204.119.242:9980: 2017/07/12 21:32:02 [INFO] raft: Node at 10.204.119.242:9980 [Follower]
entering Follower state (Leader: "")
10.204.119.242:9980: 2017/07/12 21:32:04 [WARN] raft: Heartbeat timeout from "" reached,
starting election
10.204.119.242:9980: 2017/07/12 21:32:04 [INFO] raft: Node at 10.204.119.242:9980
[Candidate] entering Candidate state
10.204.119.242:9980: 2017/07/12 21:32:04 [ERR] raft: Failed to make RequestVote RPC to
10.204.119.99:9980: dialing failed: dial tcp 10.204.119.99:9980: getsockopt: connection
refused
10.204.119.242:9980: 2017/07/12 21:32:04 [ERR] raft: Failed to make RequestVote RPC to
10.204.119.128:9980: dialing failed: dial tcp 10.204.119.128:9980: getsockopt: connection
refused
10.204.119.242:9980: 2017/07/12 21:32:04 [DEBUG] raft: Votes needed: 2
10.204.119.242:9980: 2017/07/12 21:32:04 [DEBUG] raft: Vote granted from
10.204.119.242:9980. Tally: 1
10.204.119.242:9980: 2017/07/12 21:32:06 [WARN] raft: Election timeout reached, restarting
election
10.204.119.242:9980: 2017/07/12 21:32:06 [INFO] raft: Node at 10.204.119.242:9980
[Candidate] entering Candidate state
10.204.119.242:9980: 2017/07/12 21:32:06 [ERR] raft: Failed to make RequestVote RPC to
10.204.119.128:9980: dialing failed: dial tcp 10.204.119.128:9980: getsockopt: connection
refused
10.204.119.242:9980: 2017/07/12 21:32:06 [DEBUG] raft: Votes needed: 2
10.204.119.242:9980: 2017/07/12 21:32:06 [DEBUG] raft: Vote granted from
10.204.119.242:9980. Tally: 1
10.204.119.242:9980: 2017/07/12 21:32:06 [DEBUG] raft-net: 10.204.119.242:9980 accepted
connection from: 10.204.119.99:54206
10.204.119.242:9980: 2017/07/12 21:32:06 [INFO] raft: Node at 10.204.119.242:9980 [Follower]
entering Follower state (Leader: "")
10.204.119.242:9980: 2017/07/12 21:32:06 [WARN] raft: Failed to get previous log: 3218 log
not found (last: 2451)
10.204.119.242:9980: 2017/07/12 21:32:06 [INFO] snapshot: Creating new snapshot at
/root/data/snapshots/42-3215-1499895126707.tmp
10.204.119.242:9980: 2017/07/12 21:32:06 [INFO] snapshot: reaping snapshot
/root/data/snapshots/40-2368-1499894658052
10.204.119.242:9980: 2017/07/12 21:32:06 [INFO] raft: Copied 1068172 bytes to local snapshot
10.204.119.242:9980: 2017/07/12 21:32:06 [DEBUG] dqlite: restore snapshot
10.204.119.242:9980: 2017/07/12 21:32:06 [DEBUG] dqlite: snapshot database size 532480
10.204.119.242:9980: 2017/07/12 21:32:06 [DEBUG] dqlite: snapshot wal size 535632
panic: restore failure: database 'test.db' has 1 leader connections

goroutine 28 [running]:
github.com/dqlite/dqlite/replication.(*FSM).restoreDatabase(0xc42010a900, 0xc9c3a0,
0xc42010d950, 0x0, 0x0, 0x0)
    /root/go/src/github.com/dqlite/dqlite/replication/fsm.go:394 +0x1146
github.com/dqlite/dqlite/replication.(*FSM).Restore(0xc42010a900, 0xc9c3a0, 0xc42010d950,
0xc420123e00, 0xc9c3a0)
    /root/go/src/github.com/dqlite/dqlite/replication/fsm.go:307 +0x84
github.com/hashicorp/raft.(*Raft).runFSM(0xc42013c800)
    /root/go/src/github.com/hashicorp/raft/raft.go:552 +0x383
github.com/hashicorp/raft.(*Raft).(github.com/hashicorp/raft.runFSM)-fm()
    /root/go/src/github.com/hashicorp/raft/raft.go:259 +0x2a
github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc42013c800, 0xc42010ce00)
    /root/go/src/github.com/hashicorp/raft/state.go:142 +0x53
created by github.com/hashicorp/raft.(*raftState).goFunc
    /root/go/src/github.com/hashicorp/raft/state.go:143 +0x66

runtime: pointer 0xc420203000 to unused region of span idx=0x101 span.base()=0xc420314000

This crash happens occasionally in the demo program.

The reason is a violation CGO rules for passing pointers from Go to C, as the pages[i].Fill(page.Data, uint16(page.Flags), page.Number) call stores a Go pointer directly in C memory, which is forbidden. See also:

golang/go#19135

Stacktrace:

10.204.119.242:9980: 2017/07/14 08:24:11 [INFO] demo: start
runtime: pointer 0xc420203000 to unused region of span idx=0x101 span.base()=0xc420314000
span.limit=0xc420316000 span.state=1
fatal error: found bad pointer in Go heap (incorrect use of unsafe or cgo?)

runtime stack:
runtime.throw(0x8e8a89, 0x3e)
    /usr/lib/go-1.8/src/runtime/panic.go:596 +0x95
runtime.heapBitsForObject(0xc420203000, 0x0, 0x0, 0xc41ffdd0ff, 0xc400000000,
0x7ff18c52d410, 0xc42001d228, 0x0)
    /usr/lib/go-1.8/src/runtime/mbitmap.go:433 +0x2bb
runtime.shade(0xc420203000)
    /usr/lib/go-1.8/src/runtime/mgcmark.go:1340 +0x41
runtime.gcmarkwb_m(0x7ff160000a20, 0xc42045e000)
    /usr/lib/go-1.8/src/runtime/mbarrier.go:154 +0xea
runtime.writebarrierptr_prewrite1.func1()
    /usr/lib/go-1.8/src/runtime/mbarrier.go:188 +0x64
runtime.systemstack(0xc420164ba0)
    /usr/lib/go-1.8/src/runtime/asm_amd64.s:327 +0x79
runtime.mstart()
    /usr/lib/go-1.8/src/runtime/proc.go:1132

goroutine 10 [running]:
runtime.systemstack_switch()
    /usr/lib/go-1.8/src/runtime/asm_amd64.s:281 fp=0xc420049a18 sp=0xc420049a10
runtime.writebarrierptr_prewrite1(0x7ff160000a20, 0xc42045e000)
    /usr/lib/go-1.8/src/runtime/mbarrier.go:189 +0xbf fp=0xc420049a58 sp=0xc420049a18
runtime.writebarrierptr(0x7ff160000a20, 0xc42045e000)
    /usr/lib/go-1.8/src/runtime/mbarrier.go:211 +0x4d fp=0xc420049a90 sp=0xc420049a58
github.com/dqlite/go-sqlite3x.(*ReplicationPage).Fill(0x7ff160000a20, 0xc42045e000, 0x1000,
0x1000, 0x200000006)
    /root/go/src/github.com/dqlite/go-sqlite3x/replication.go:39 +0xd6 fp=0xc420049ae0
sp=0xc420049a90
github.com/dqlite/dqlite/replication.(*FSM).applyWalFrames(0xc420016a50, 0xc420317f60, 0x0,
0x0)
    /root/go/src/github.com/dqlite/dqlite/replication/fsm.go:157 +0x1e5 fp=0xc420049b80
sp=0xc420049ae0
github.com/dqlite/dqlite/replication.(*FSM).Apply(0xc420016a50, 0xc4204e0280, 0xd01980,
0xed0fa7a01)
    /root/go/src/github.com/dqlite/dqlite/replication/fsm.go:58 +0x3d2 fp=0xc420049c68
sp=0xc420049b80
github.com/hashicorp/raft.(*Raft).runFSM(0xc420150600)
    /root/go/src/github.com/hashicorp/raft/raft.go:596 +0xc79 fp=0xc420049f90
sp=0xc420049c68
github.com/hashicorp/raft.(*Raft).(github.com/hashicorp/raft.runFSM)-fm()
    /root/go/src/github.com/hashicorp/raft/raft.go:259 +0x2a fp=0xc420049fa8 sp=0xc420049f90
github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc420150600, 0xc420012a80)
    /root/go/src/github.com/hashicorp/raft/state.go:142 +0x53 fp=0xc420049fd0
sp=0xc420049fa8
runtime.goexit()
    /usr/lib/go-1.8/src/runtime/asm_amd64.s:2197 +0x1 fp=0xc420049fd8 sp=0xc420049fd0
created by github.com/hashicorp/raft.(*raftState).goFunc
    /root/go/src/github.com/hashicorp/raft/state.go:143 +0x66

goroutine 1 [chan receive, 14 minutes]:
github.com/dqlite/dqlite.(*Driver).WaitLeadership(0xc4200172f0, 0x0, 0x0)
    /root/go/src/github.com/dqlite/dqlite/driver.go:124 +0x70
github.com/dqlite/dqlite.(*Driver).Open(0xc4200172f0, 0x8d0196, 0x7, 0xc4200ada00, 0x75e636,
0x456710, 0xc4200adab0)
    /root/go/src/github.com/dqlite/dqlite/driver.go:89 +0x2f
database/sql.(*DB).conn(0xc4201600a0, 0xcdc4c0, 0xc420070190, 0x1, 0xc4200adce0, 0x413982,
0xc420017350)
    /usr/lib/go-1.8/src/database/sql/sql.go:965 +0x146
database/sql.(*DB).begin(0xc4201600a0, 0xcdc4c0, 0xc420070190, 0x0, 0x1, 0xc4201600a0, 0x0,
0x0)
    /usr/lib/go-1.8/src/database/sql/sql.go:1360 +0x8a
database/sql.(*DB).BeginTx(0xc4201600a0, 0xcdc4c0, 0xc420070190, 0x0, 0xc4200add88,
0xc4200add78, 0x4da202)
    /usr/lib/go-1.8/src/database/sql/sql.go:1342 +0x70
database/sql.(*DB).Begin(0xc4201600a0, 0x8eb6e0, 0xc4200addd8, 0x4dab9a)
    /usr/lib/go-1.8/src/database/sql/sql.go:1356 +0x4c
main.insertForever(0xc4201600a0)
    /root/go/src/github.com/dqlite/dqlite/testdata/demo.go:105 +0x40
main.main()
    /root/go/src/github.com/dqlite/dqlite/testdata/demo.go:92 +0x7fe

goroutine 17 [syscall, 14 minutes, locked to thread]:
runtime.goexit()
    /usr/lib/go-1.8/src/runtime/asm_amd64.s:2197 +0x1

goroutine 3 [syscall, 14 minutes]:
os/signal.signal_recv(0x6e09cb)
    /usr/lib/go-1.8/src/runtime/sigqueue.go:116 +0x104
os/signal.loop()
    /usr/lib/go-1.8/src/os/signal/signal_unix.go:22 +0x22
created by os/signal.init.1
    /usr/lib/go-1.8/src/os/signal/signal_unix.go:28 +0x41

goroutine 4 [IO wait, 2 minutes]:
net.runtime_pollWait(0x7ff18c5c6f70, 0x72, 0x0)
    /usr/lib/go-1.8/src/runtime/netpoll.go:164 +0x59
net.(*pollDesc).wait(0xc420134068, 0x72, 0x0, 0xc4202452e0)
    /usr/lib/go-1.8/src/net/fd_poll_runtime.go:75 +0x38
net.(*pollDesc).waitRead(0xc420134068, 0xffffffffffffffff, 0x0)
    /usr/lib/go-1.8/src/net/fd_poll_runtime.go:80 +0x34
net.(*netFD).accept(0xc420134000, 0x0, 0xcd7240, 0xc4202452e0)
    /usr/lib/go-1.8/src/net/fd_unix.go:430 +0x1e5
net.(*TCPListener).accept(0xc42000e030, 0xc42016eed8, 0x67eb3e, 0x456710)
    /usr/lib/go-1.8/src/net/tcpsock_posix.go:136 +0x2e
net.(*TCPListener).Accept(0xc42000e030, 0x8ec6a0, 0xc4201b4000, 0xcdc540, 0xc420073a10)
    /usr/lib/go-1.8/src/net/tcpsock.go:228 +0x49
net/http.(*Server).Serve(0xc42012e160, 0xcdb980, 0xc42000e030, 0x0, 0x0)
    /usr/lib/go-1.8/src/net/http/server.go:2643 +0x228
created by main.main
    /root/go/src/github.com/dqlite/dqlite/testdata/demo.go:63 +0x4a7

goroutine 5 [chan receive, 2 minutes]:
github.com/dqlite/raft-http.(*Layer).Accept(0xc420016900, 0x8ec208, 0xc420134070, 0xcdedc0,
0xc42018c260)
    /root/go/src/github.com/dqlite/raft-http/layer.go:39 +0x5e
github.com/hashicorp/raft.(*NetworkTransport).listen(0xc420134070)
    /root/go/src/github.com/hashicorp/raft/net_transport.go:362 +0x49
created by github.com/hashicorp/raft.NewNetworkTransportWithLogger
    /root/go/src/github.com/hashicorp/raft/net_transport.go:154 +0x192

goroutine 9 [select]:
github.com/hashicorp/raft.(*Raft).runFollower(0xc420150600)
    /root/go/src/github.com/hashicorp/raft/raft.go:646 +0xa1a
github.com/hashicorp/raft.(*Raft).run(0xc420150600)
    /root/go/src/github.com/hashicorp/raft/raft.go:630 +0xa4
github.com/hashicorp/raft.(*Raft).(github.com/hashicorp/raft.run)-fm()
    /root/go/src/github.com/hashicorp/raft/raft.go:258 +0x2a
github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc420150600, 0xc420012a70)
    /root/go/src/github.com/hashicorp/raft/state.go:142 +0x53
created by github.com/hashicorp/raft.(*raftState).goFunc
    /root/go/src/github.com/hashicorp/raft/state.go:143 +0x66

goroutine 11 [select]:
github.com/hashicorp/raft.(*Raft).runSnapshots(0xc420150600)
    /root/go/src/github.com/hashicorp/raft/raft.go:1744 +0x367
github.com/hashicorp/raft.(*Raft).(github.com/hashicorp/raft.runSnapshots)-fm()
    /root/go/src/github.com/hashicorp/raft/raft.go:260 +0x2a
github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc420150600, 0xc420012a90)
    /root/go/src/github.com/hashicorp/raft/state.go:142 +0x53
created by github.com/hashicorp/raft.(*raftState).goFunc
    /root/go/src/github.com/hashicorp/raft/state.go:143 +0x66

goroutine 12 [chan receive, 14 minutes]:
github.com/dqlite/raft-membership.HandleChangeRequests(0xc420150600, 0xc4200100c0)
    /root/go/src/github.com/dqlite/raft-membership/handle.go:27 +0x51
created by github.com/dqlite/dqlite.NewDriver
    /root/go/src/github.com/dqlite/dqlite/driver.go:81 +0x573

goroutine 13 [chan receive, 14 minutes]:
database/sql.(*DB).connectionOpener(0xc4201600a0)
    /usr/lib/go-1.8/src/database/sql/sql.go:837 +0x4a
created by database/sql.Open
    /usr/lib/go-1.8/src/database/sql/sql.go:582 +0x212

goroutine 531 [IO wait]:
net.runtime_pollWait(0x7ff18c5c6d30, 0x72, 0xb)
    /usr/lib/go-1.8/src/runtime/netpoll.go:164 +0x59
net.(*pollDesc).wait(0xc4204f15d8, 0x72, 0xcd85c0, 0xcd4740)
    /usr/lib/go-1.8/src/net/fd_poll_runtime.go:75 +0x38
net.(*pollDesc).waitRead(0xc4204f15d8, 0xc42025a000, 0x1000)
    /usr/lib/go-1.8/src/net/fd_poll_runtime.go:80 +0x34
net.(*netFD).Read(0xc4204f1570, 0xc42025a000, 0x1000, 0x1000, 0x0, 0xcd85c0, 0xcd4740)
    /usr/lib/go-1.8/src/net/fd_unix.go:250 +0x1b7
net.(*conn).Read(0xc4201224d8, 0xc42025a000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    /usr/lib/go-1.8/src/net/net.go:181 +0x70
bufio.(*Reader).fill(0xc4203cf440)
    /usr/lib/go-1.8/src/bufio/bufio.go:97 +0x117
bufio.(*Reader).ReadByte(0xc4203cf440, 0xc420198240, 0xc420489cd0, 0xc420489ca0)
    /usr/lib/go-1.8/src/bufio/bufio.go:239 +0x5b
github.com/hashicorp/raft.(*NetworkTransport).handleCommand(0xc420134070, 0xc4203cf440,
0xc4203cf4a0, 0xc4204f1650, 0x0, 0x0)
    /root/go/src/github.com/hashicorp/raft/net_transport.go:402 +0x43
github.com/hashicorp/raft.(*NetworkTransport).handleConn(0xc420134070, 0xcdedc0,
0xc4201224d8)
    /root/go/src/github.com/hashicorp/raft/net_transport.go:386 +0x221
created by github.com/hashicorp/raft.(*NetworkTransport).listen
    /root/go/src/github.com/hashicorp/raft/net_transport.go:373 +0x1d0

goroutine 1696 [IO wait]:
net.runtime_pollWait(0x7ff18c5c6eb0, 0x72, 0xa)
    /usr/lib/go-1.8/src/runtime/netpoll.go:164 +0x59
net.(*pollDesc).wait(0xc4204f0998, 0x72, 0xcd85c0, 0xcd4740)
    /usr/lib/go-1.8/src/net/fd_poll_runtime.go:75 +0x38
net.(*pollDesc).waitRead(0xc4204f0998, 0xc4203fc000, 0x1000)
    /usr/lib/go-1.8/src/net/fd_poll_runtime.go:80 +0x34
net.(*netFD).Read(0xc4204f0930, 0xc4203fc000, 0x1000, 0x1000, 0x0, 0xcd85c0, 0xcd4740)
    /usr/lib/go-1.8/src/net/fd_unix.go:250 +0x1b7
net.(*conn).Read(0xc42018c260, 0xc4203fc000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    /usr/lib/go-1.8/src/net/net.go:181 +0x70
bufio.(*Reader).fill(0xc420259ce0)
    /usr/lib/go-1.8/src/bufio/bufio.go:97 +0x117
bufio.(*Reader).ReadByte(0xc420259ce0, 0x300000002, 0xc420165a00, 0xc420035ca0)
    /usr/lib/go-1.8/src/bufio/bufio.go:239 +0x5b
github.com/hashicorp/raft.(*NetworkTransport).handleCommand(0xc420134070, 0xc420259ce0,
0xc420259d40, 0xc4204f0a10, 0x0, 0x0)
    /root/go/src/github.com/hashicorp/raft/net_transport.go:402 +0x43
github.com/hashicorp/raft.(*NetworkTransport).handleConn(0xc420134070, 0xcdedc0,
0xc42018c260)
    /root/go/src/github.com/hashicorp/raft/net_transport.go:386 +0x221
created by github.com/hashicorp/raft.(*NetworkTransport).listen
    /root/go/src/github.com/hashicorp/raft/net_transport.go:373 +0x1d0

recover failed with error code 1

I'm testing error scenarios and trying to call Node.Recover() just returns an error "recover failed with error code 1" I'm not sure where to go from here. I have a two node cluster and trying to reset it back to just one node.

segvfault on first access

I'm sure there is a simple fix here, but I'm getting a segfault whenever I try to use dqlite. dqlite-demo seems to work fine but from my application it always fails with the below (from gdb)

#0  pcacheManageDirtyList (pPage=pPage@entry=0x7fffc8001498, addRemove=addRemove@entry=1 '\001') at sqlite3-binding.c:47466
#1  0x0000000006915f5e in sqlite3PcacheMakeClean (p=0x7fffc8001498) at sqlite3-binding.c:47863
#2  0x0000000006915fde in sqlite3PcacheTruncate (pCache=0x7fffcc005b48, pgno=0) at sqlite3-binding.c:47946
#3  0x0000000006953000 in pagerBeginReadTransaction (pPager=0x7fffcc005a18) at sqlite3-binding.c:53344
#4  sqlite3PagerSharedLock (pPager=0x7fffcc005a18) at sqlite3-binding.c:55436
#5  0x0000000006953b40 in lockBtree (pBt=0x7fffcc000b68) at sqlite3-binding.c:65656
#6  sqlite3BtreeBeginTrans (p=p@entry=0x7fffcc005598, wrflag=1, pSchemaVersion=pSchemaVersion@entry=0x7fffec596ff0) at sqlite3-binding.c:494
#7  0x000000000697a703 in sqlite3VdbeExec (p=<optimized out>) at sqlite3-binding.c:85912
#8  0x000000000698285f in sqlite3Step (p=0x7fffcc008418) at sqlite3-binding.c:81041
#9  sqlite3_step (pStmt=<optimized out>) at sqlite3-binding.c:15568
#10 sqlite3_step (pStmt=<optimized out>) at sqlite3-binding.c:15556
#11 0x00007ffff7f77b40 in ?? () from /usr/lib/x86_64-linux-gnu/libdqlite.so.0
#12 0x00007ffff7c7e180 in ?? () from /usr/lib/x86_64-linux-gnu/libco.so.0
#13 0x0000000000000000 in ?? ()

I running

libdqlite-dev:amd64                   1.0.0-0~201908281439~ubuntu19.10.1              
libdqlite0:amd64                      1.0.0-0~201908281439~ubuntu19.10.1              
libsqlite3-0:amd64                    3.29.0+replication3-8~201908190856~ubuntu19.10.1
libsqlite3-dev:amd64                  3.29.0+replication3-8~201908190856~ubuntu19.10.1

That last time I had this working I was on Ubuntu 19.04 and I have since upgraded to 19.10. I don't know if that is related or somehow else I broke this. I'll keep debugging sqlite to figure what I did wrong, but any help would be great. Thanks.

Setting SQLITE_CONFIG_SINGLETHREAD can lead to races

It seems that the Go client for some reason might issue concurrent queries, although the database/sql package should prevent that.

This is a C program to roughly reproduce the race:

#include <pthread.h>
#include <sqlite3.h>
#include <stdio.h>
#include <unistd.h>

void *run(void *arg) {
  sqlite3 *db = arg;
  int rv;
  int i;

  printf("inserting\n");

  for (i = 0; i < 1; i++) {
    rv = sqlite3_exec(db, "INSERT INTO test(n) SELECT n FROM test2", NULL, NULL, NULL);
    if (rv != 0) {
      return (void *)1;
    }
  }

  printf("inserted\n");

  return 0;
}

int main() {
  int rv;
  pthread_t thread;
  void *retval = 0;
  sqlite3 *db;
  int i;

  rv = sqlite3_config(SQLITE_CONFIG_SINGLETHREAD);
  if (rv != SQLITE_OK) {
    return rv;
  }

  rv = sqlite3_open_v2(":memory:", &db,
                       SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE, NULL);
  if (rv != 0) {
    return rv;
  }

  rv = sqlite3_exec(db, "CREATE TABLE test (n INT)", NULL, NULL, NULL);
  if (rv != 0) {
    return rv;
  }

  rv = sqlite3_exec(db, "CREATE TABLE test2 (n INT)", NULL, NULL, NULL);
  if (rv != 0) {
    return rv;
  }

  for (i = 0; i < 1000; i++) {
    rv = sqlite3_exec(db, "INSERT INTO test2(n) VALUES(1)", NULL, NULL, NULL);
    if (rv != 0) {
      return rv;
    }
  }

  rv = sqlite3_exec(db, "BEGIN", NULL, NULL, NULL);
  if (rv != 0) {
    return rv;
  }

  rv = pthread_create(&thread, 0, &run, db);
  if (rv != 0) {
    return rv;
  }

  usleep(500);

  rv = sqlite3_exec(db, "COMMIT", NULL, NULL, NULL);
  if (rv != 0) {
    printf("error: %s\n", sqlite3_errmsg(db));
  }
  rv = sqlite3_exec(db, "BEGIN", NULL, NULL, NULL);
  if (rv != 0) {
    printf("error: %s\n", sqlite3_errmsg(db));
    return rv;
  }
  printf("committed\n");

  rv = pthread_join(thread, &retval);
  if (rv != 0) {
    return rv;
  }

  if (retval) {
    return 1;
  }

  return 0;
}

Question: Any plans for a cross platform library

Is there any plan for having a cross-platform dsqlite?

We know that libuv and SQLite is cross/multi platform. Yet, C-raft and libco are Linux only libraries. Is there a future plan to replace them with cross-platform equivalences -existing or newly created?

database is locked errors

I have a goland app running on sqlite today using the configuration "./db/state.db?_journal=WAL&cache=shared." This is essentially a multithreaded app and it runs with no apparent issues. I've switched it to using dqlite and it immediately gets "database is locked" errors. Is there something I can do to allow concurrency without getting "database is locked" errors?

sqlite CLI?

I don't know if this makes technical sense, but would it possible to have a sqlite CLI that works off the state on disk? I find the biggest thing I miss running dqlite over a sqlite setup is that I don't know how to just run the sqlite CLI and run ah hoc queries to troubleshoot.

How to backup

What's the best approach to do backups on a live cluster?

what to do when quorum is lost

When quorum is lost what is the procedure to restore the cluster? What I'm currently trying to do now is manually pick one master and just remove all peers from it. That doesn't seem to go anywhere until I actually restore the peers. I understand that membership change probably requires quorum (as it should), but I need some "emergency reset" mechanism.

Lose leader on down cluster

I'm having trouble recovering from this scenario.

  1. Have a three node cluster
  2. Kill all three nodes, so cluster is fully down
  3. Start the two nodes that are not the leader.

In this situation I'd expect the two started nodes to elect a leader and then the db would run with two nodes. Right now after restart client.FindLeader() will never return on both of the two running nodes.

understanding Node ID

I'm trying to understand the use and importance of node ID. Two observations I have that I'd like a bit clarification on are:

  1. Nodes started with 1 seems to be special in that they are automatically a leader. So I assume 1 is a special value. Is the accepted way to build a cluster be to always start with ID 1 and then add nodes to that master.

  2. In go-dqlite the sqlite based NodeStore does not seem to respect IDs. This is what threw me off and makes me question if ID is really important at all and used by dqlite internals. So what is the purpose of the NodeID?

configure fail with syntax error

when i tried to build lxd, make deps failed when running ./configure :

./configure: line 9746: syntax error near unexpected token `SQLITE,'
./configure: line 9746: `PKG_CHECK_MODULES(SQLITE, sqlite3 >= 3.22.0, , )'

the code ref is https://github.com/lxc/lxd/blob/master/Makefile#L59

where should this token SQLITE be init/create?

I tried to comment those PKG_CHECK_MODULES lines in the configure file, and the configuration was going to succeed. but it's complicated to do this when packaging automatically

btw, all the deps except sqlite3 had already been installed, and sqlite was compiled manually and going to be static linked

OS: ArchLinux in lxd (newest version)

maybe i should not compile it in the container?

sqlcipher

Hello,

Is it possible to use dqlite with sqlcipher?

Question: Is it possible to have one DB replicated and another non replicated in one app?

We have different SLO for different data types, based on trade-off of performance and data reliability. Some data requires replication for higher reliability, while some other can have possible loss but require low latency. If we could configure three types of databases at the same time: replicated, non-replicated, in-memory only, that will be great! Can I achieve that? Thanks!

detect if cluster is stable?

I ran into an issue where I had flapping servers. I had three nodes and they were all randomly rebooting. Now fundamentally the issue is caused by me because I run the dqlite server and client in the same process. So what would happen is the process would start the dqlite server and then my client logic would try to connect and fail, so we kill the process startup. To mitigate this issue the intention was to start the dqlite server, then wait until it seems like the server is healthy (quorum is reached) and then start the client. It seemed that logic would avoid the flapping.

The problem is trying to detect if the cluster is stable. I first was just waiting for FindLeader method to return successful, but that didn't seem sufficient. I then tried to do a conn.Ping() from the dqlite goland SQL driver, but that seems to be a no-op. I then did a "SELECT 1" on startup but as soon as I did that in this error situation I would get an assertion failure in dqlite C code that would exit the program.

assert(rc == 0);
In the end I couldn't figure out the right way to detect that the cluster was healthy.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.