Code Monkey home page Code Monkey logo

Comments (21)

freeekanayaka avatar freeekanayaka commented on May 25, 2024 2

Yeah, indeed queuing up clients is something I have been wanted to do. This is sort of the first pass implementation. I need to check exactly if an existing transaction can be recovered after SQLITE_BUSY is thrown, as I'm not entirely sure what the internals are there. Will get back once I gather more info.

from dqlite.

freeekanayaka avatar freeekanayaka commented on May 25, 2024 1
  1. No it shouldn't.
  2. That is something to fix indeed. I'll look at it soon, it should be something trivial. I'll use your program as reproducer.

from dqlite.

freeekanayaka avatar freeekanayaka commented on May 25, 2024 1

Yeah, indeed queuing up clients is something I have been wanted to do. This is sort of the first pass implementation. I need to check exactly if an existing transaction can be recovered after SQLITE_BUSY is thrown, as I'm not entirely sure what the internals are there. Will get back once I gather more info.

Hi, I'm curious if you found out more and what the current state of affairs is?

If I understand correctly retrying is the only sensible approach, because once a transaction fails with SQLITE_BUSY because another transaction is writing, that transaction must be trashed and can't be recovered, since there's no guarantee the modifications of the tentative transaction won't have conflicts with the transaction that is holding the write lock. This is basically a consequence of the fact that SQLite is a single-writer engine.

from dqlite.

freeekanayaka avatar freeekanayaka commented on May 25, 2024

The _journal=WAL option is specific to the go-sqlite3 sql driver, the dqlite sql driver does not need it nor support it (all connections are in WAL mode anyways).

Why do you need the cache=shared settings? IIRC that should be used only if you have lots of connections and constrained memory.

Anyway, the small test program below (which uses the connection string you report) works here, even with some little concurrency, so more details would be needed (maybe tweaking the test program until it matches your situation).

package main

import (
	"context"
	"database/sql"
	"fmt"
	"io/ioutil"
	"log"
	"os"
	"sync"

	dqlite "github.com/canonical/go-dqlite"
	"github.com/canonical/go-dqlite/client"
	"github.com/canonical/go-dqlite/driver"
)

func main() {
	dir, err := ioutil.TempDir("", "dqlite-issue-164")
	if err != nil {
		log.Fatal(err)
	}
	defer os.RemoveAll(dir)

	info := client.NodeInfo{ID: uint64(1), Address: "@1"}
	server, err := dqlite.New(info.ID, info.Address, dir, dqlite.WithBindAddress(info.Address))
	if err != nil {
		log.Fatal(err)
	}
	defer server.Close()

	if err := server.Start(); err != nil {
		log.Fatal(err)
	}

	store, err := client.DefaultNodeStore(":memory:")
	if err != nil {
		log.Fatal(err)
	}
	store.Set(context.Background(), []client.NodeInfo{info})
	driver, err := driver.New(store)
	if err != nil {
		log.Fatal(err)
	}
	sql.Register("dqlite", driver)

	db1, err := sql.Open("dqlite", "./db/state.db?_journal=WAL&cache=shared")
	if err != nil {
		log.Fatal(err)
	}

	db2, err := sql.Open("dqlite", "./db/state.db?_journal=WAL&cache=shared")
	if err != nil {
		log.Fatal(err)
	}

	if _, err := db1.Exec("CREATE TABLE test (n INT); INSERT INTO test(n) VALUES(1)"); err != nil {
		log.Fatal(err)
	}

	wg := sync.WaitGroup{}
	wg.Add(2)
	query := func(db *sql.DB) {
		rows, err := db.Query("SELECT n FROM test")
		if err != nil {
			log.Fatal(err)
		}
		for rows.Next() {
			var n int64
			if err := rows.Scan(&n); err != nil {
				log.Fatal(err)
			}
			fmt.Println("N", n)
		}
		rows.Close()
		wg.Done()
	}

	go query(db1)
	go query(db2)
	wg.Wait()
}

from dqlite.

ibuildthecloud avatar ibuildthecloud commented on May 25, 2024

Thanks for the response. I'll try to put together a test case to reproduce this. To give a tiny bit more context this is for k3s. We rewrote the sqlite backend so we don't use kvsql anymore but the SQL is quite similar. It's in https://github.com/ibuildthecloud/kine.

I'll push my branch that supports dqlite too if you care to see it. At the moment it's quite ugly.

from dqlite.

freeekanayaka avatar freeekanayaka commented on May 25, 2024

Yeah I noticed that you switched from kvsql to kine. Out of curiosity, what was the reason behind that? Perhaps difficulty in getting a new storage interface implementation merged upstream by k8s?

And re the dqlite backend, if you are going to have more than one apiserver, how are you going to handle cross-node broadcast of watch events?

Sorry if this is off topic :)

from dqlite.

ibuildthecloud avatar ibuildthecloud commented on May 25, 2024

@freeekanayaka kine handles cross node notification. That is the one technical reason for the rewrite. The other reason for the rewrite was to find an approach that will work with any kubernetes distribution, not a modified one (k3s). So from that perspective our only reasonable approach was to create an etcd shim layer. So kine implements a bare minimum of etcd API to work with k8s. You can run kine as a separate process or you can embedded it in the same process as your client.

from dqlite.

ibuildthecloud avatar ibuildthecloud commented on May 25, 2024

@freeekanayaka Hopefully this isn't too involved. If it is I can write a small program to try to reproduce, but think it's easy enough to just run kine + k3s to reproduce the issue. Here's how to reproduce.

Enviroment
Ubuntu 19.04, go 1.12.6 w/

ii  dqlite                                1.0.0-0~201908281439~ubuntu19.04.1               amd64        dqlite command line demo application
ii  libdqlite-dev:amd64                   1.0.0-0~201908281439~ubuntu19.04.1               amd64        dqlite development files
ii  libdqlite0:amd64                      1.0.0-0~201908281439~ubuntu19.04.1               amd64        dqlite shared library
ii  libsqlite3-0:amd64                    3.29.0+replication3-8~201908190856~ubuntu19.04.1 amd64        SQLite 3 shared library
ii  libsqlite3-dev:amd64                  3.29.0+replication3-8~201908190856~ubuntu19.04.1 amd64        SQLite 3 development files
ii  sqlite3                               3.29.0+replication3-8~201908190856~ubuntu19.04.1 amd64        Command line interface for SQLite 3

Run the below

#!/bin/bash

mkdir tmp
cd tmp

# Download k3s v0.9.1
curl -fL -o k3s https://github.com/rancher/k3s/releases/download/v0.9.1/k3s

# Run dqlite-demo
dqlite-demo start 1 -d ./test &

# Download, clone, run kine
git clone -b dqlite https://github.com/ibuildthecloud/kine kine
chmod +x k3s
cd ./kine
go build -tags libsqlite3
cd ..
./kine/kine --endpoint 'dqlite://?peer=1:127.0.0.1:9181' &

# Run k3s
./k3s server --storage-endpoint http://127.0.0.1:2379 --disable-agent --data-dir ./test

Delete ./tmp to cleanup

You should see a bunch of the below and k3s will most likely crash on startup.

ERRO[0091] error in txn: database is locked

from dqlite.

freeekanayaka avatar freeekanayaka commented on May 25, 2024

I looked quickly at the diff. It seems you are creating the dqlite SQL driver correctly (with driver.New()), but you are not starting the actual dqlite.Node server engine. The driver is just a client that connects to the server engine either via abstract unix socket or tcp.

You'll need to add the equivalent the example snippet I pasted above, in particular:

        info := client.NodeInfo{ID: uint64(1), Address: "@1"}
	server, err := dqlite.New(info.ID, info.Address, dir, dqlite.WithBindAddress(info.Address))
	if err != nil {
		log.Fatal(err)
	}
	defer server.Close()

	if err := server.Start(); err != nil {
		log.Fatal(err)
	}

and you'll want to use a TCP address for dqlite.WithBindAddress that matches what pass to the driver (e.g. 127.0.0.1:9181).

from dqlite.

freeekanayaka avatar freeekanayaka commented on May 25, 2024

Hm actually I just noticed that you are also starting the dqlite-demo process in the snippet you pasted, so I assume your intent would be to connect to it (and I guess in general you don't want to start the dqlite server node in kine itself, but let consumers decide).

Just as a quick check, can you try launching kine with:

./kine/kine --endpoint 'dqlite://demo.db?peer=1:127.0.0.1:9181'

or whatever dsn strings ends up passing demo.db as dataSourceName argument of NewVariant().

from dqlite.

ibuildthecloud avatar ibuildthecloud commented on May 25, 2024

@freeekanayaka This isn't the full integration of dqlite. We will be running dqlite much the same as LXD with a http proxy, certs, heartbeats, etc. All of that code is in k3s, not kine and I haven't pushed the branch yet. Probably will today.

Running with ./kine/kine --endpoint 'dqlite://demo.db?peer=1:127.0.0.1:9181' gives the same result. If you don't put demo.db, it just defaults to what we use in sqlite which is ./db/state.db?_journal=WAL&cached=shared. Also, I'm not really an expert on sqlite so I'm not even too sure why I ever put cached=shared there.

from dqlite.

ibuildthecloud avatar ibuildthecloud commented on May 25, 2024

@freeekanayaka I modified your test program to reproduce, but I now realize I might not have been clear in my description. I specifically get "database is locked" errors on concurrent writes. Reads work fine in parallel. Are concurrent writes not expected to work?

package main

import (
	"context"
	"database/sql"
	"io/ioutil"
	"log"
	"os"

	dqlite "github.com/canonical/go-dqlite"
	"github.com/canonical/go-dqlite/client"
	"github.com/canonical/go-dqlite/driver"
)

func main() {
	dir, err := ioutil.TempDir("", "dqlite-issue-164")
	if err != nil {
		log.Fatal(err)
	}
	defer os.RemoveAll(dir)

	info := client.NodeInfo{ID: uint64(1), Address: "@1"}
	server, err := dqlite.New(info.ID, info.Address, dir, dqlite.WithBindAddress(info.Address))
	if err != nil {
		log.Fatal(err)
	}
	defer server.Close()

	if err := server.Start(); err != nil {
		log.Fatal(err)
	}

	store, err := client.DefaultNodeStore(":memory:")
	if err != nil {
		log.Fatal(err)
	}
	store.Set(context.Background(), []client.NodeInfo{info})
	driver, err := driver.New(store)
	if err != nil {
		log.Fatal(err)
	}
	sql.Register("dqlite", driver)

	db, err := sql.Open("dqlite", "./db/state.db?_journal=WAL&cache=shared")
	if err != nil {
		log.Fatal(err)
	}

	if _, err := db.Exec("CREATE TABLE test (n INT); INSERT INTO test(n) VALUES(1)"); err != nil {
		log.Fatal("create table", err)
	}

	query := func(db *sql.DB) {
		for {
			_, err := db.Exec("INSERT INTO test(n) VALUES(1)")
			if err != nil {
				log.Fatal(err)
			}
		}
	}

	go query(db)
	query(db)
}

from dqlite.

freeekanayaka avatar freeekanayaka commented on May 25, 2024

Oh yeah I see what you mean. Multiple writers should get serialized (SQLite itself is single-writer, so as etcd since it's raft-based), but it's an area I probably need to improve and it might be currently buggy when you attempt to issue Exec statements from the same sql.DB object with the same underlying dqlite Driver instance. Most probably things should be good if you just add:

	db.SetMaxOpenConns(1)
	db.SetMaxIdleConns(1)

before running any Exec/Query.

I don't have time to test that right now, but I will do it tomorrow, unless you confirm that this workaround works.

from dqlite.

freeekanayaka avatar freeekanayaka commented on May 25, 2024

@freeekanayaka This isn't the full integration of dqlite. We will be running dqlite much the same as LXD with a http proxy, certs, heartbeats, etc. All of that code is in k3s, not kine and I haven't pushed the branch yet. Probably will today.

That's interesting. I've been wanting to extract a lot of that logic from lxd and put it in some higher-level/extension package that sits on top of go-dqlite, since most Go applications consuming dqlite need that stuff. However I'm not entirely sure what the API and functionality should look like. I think having another real-world case where it's needed might help defining that package.

Running with ./kine/kine --endpoint 'dqlite://demo.db?peer=1:127.0.0.1:9181' gives the same result. If you don't put demo.db, it just defaults to what we use in sqlite which is ./db/state.db?_journal=WAL&cached=shared. Also, I'm not really an expert on sqlite so I'm not even too sure why I ever put cached=shared there.

The _journal=WAL is definitely good for the sqlite backend. As I said it's irrelevant for the dqlite one. You shouldn't need cached=shared in either backends, unless you have really constrained memory.

from dqlite.

ibuildthecloud avatar ibuildthecloud commented on May 25, 2024

I'll try setting the connection params, but I do have two concerns with that approach.

  1. I will have multiple clients writing to the server at the same time. So three different process. Will that hit the same concurrency issue?
  2. By setting things to one max connection that prevents me from pipelining requests. I understand that sqlite/etcd/raft is single writer, but with one connection I'm forced to do request, response, request, response which will be much slower than request, request, response, response.

from dqlite.

freeekanayaka avatar freeekanayaka commented on May 25, 2024

Okay I actually think this might not be a bug, but normal behavior. By default the Go SQLite driver configures your connection to retry in case SQLITE_BUSY is returned (aka "database is locked"), which happens when there are two or more concurrent write attempts and only one them wins and succeeds.

In the dqlite Go driver I didn't want to implement the same behavior, to let clients decide their own retry policy. See the one for LXD for example.

I modified the test program to retry failed exec statements (in real world you should probably avoid a busy loop and decide for some little sleep), see below.

So my hint about db.SetMaxOpenConns(1) should actually not be needed for this specific issue.

Please let me know if you think adding some default retry policy would be worth (which could be overwritten with some option).

package main

import (
	"context"
	"database/sql"
	"io/ioutil"
	"log"
	"os"
	"sync"

	dqlite "github.com/canonical/go-dqlite"
	"github.com/canonical/go-dqlite/client"
	"github.com/canonical/go-dqlite/driver"
)

func main() {
	dir, err := ioutil.TempDir("", "dqlite-issue-164")
	if err != nil {
		log.Fatal(err)
	}
	defer os.RemoveAll(dir)

	info := client.NodeInfo{ID: uint64(1), Address: "@1"}
	server, err := dqlite.New(info.ID, info.Address, dir, dqlite.WithBindAddress(info.Address))
	if err != nil {
		log.Fatal(err)
	}
	defer server.Close()

	if err := server.Start(); err != nil {
		log.Fatal(err)
	}

	store, err := client.DefaultNodeStore(":memory:")
	if err != nil {
		log.Fatal(err)
	}
	store.Set(context.Background(), []client.NodeInfo{info})
	drv, err := driver.New(store)
	if err != nil {
		log.Fatal(err)
	}
	sql.Register("dqlite", drv)

	db, err := sql.Open("dqlite", "./db/state.db?_journal=WAL&cache=shared")
	if err != nil {
		log.Fatal(err)
	}

	if _, err := db.Exec("CREATE TABLE test (n INT); INSERT INTO test(n) VALUES(1)"); err != nil {
		log.Fatal("create table", err)
	}

	wg := sync.WaitGroup{}
	wg.Add(2)
	query := func(db *sql.DB) {
		count := 0
		for i := 0; i < 100; i++ {
			_, err := db.Exec("INSERT INTO test(n) VALUES(1)")
			if err, ok := err.(driver.Error); ok {
				if err.Code == driver.ErrBusy {
					continue
				}
			}
			if err != nil {
				log.Fatalf("err %#v\n", err)
			}
			count++
		}
		log.Printf("%d successful inserts\n", count)
		wg.Done()
	}

	go query(db)
	go query(db)
	wg.Wait()
}

from dqlite.

ibuildthecloud avatar ibuildthecloud commented on May 25, 2024

I tried this approach and it is working but I'm a bit concerned about the approach. I struggle to find a proper retry logic that won't kill performance or overwhelm the CPU. Right now I put in a stupid naive loop that just retries 20 times with no sleep. Doing this I see it's very common to take up to 3-5x to succeed under fairly light load. So my concern is if I put more load on system this approach of fail and retry might just end up in a storm of failing/retry and end up in starvation scenarios. Now this is speculation and I don't have real numbers. I'll do some proper performance testing at some point, but it seems to me that the dqlite client should be able to specify the what to do on ErrBusy and the dqlite server will handle that properly. In practice I'm not sure why on ErrBusy I wouldn't always want to just get in line and execute when its available. From that perspective I think the go-sqlite bindings might already being doing the right thing masking the user from this error. But different from the go-sqlite bindings I think this retry should be handled in the dqlite server, not client, to avoid network latency issues.

from dqlite.

schu avatar schu commented on May 25, 2024

Yeah, indeed queuing up clients is something I have been wanted to do. This is sort of the first pass implementation. I need to check exactly if an existing transaction can be recovered after SQLITE_BUSY is thrown, as I'm not entirely sure what the internals are there. Will get back once I gather more info.

Hi, I'm curious if you found out more and what the current state of affairs is?

from dqlite.

stgraber avatar stgraber commented on May 25, 2024

@freeekanayaka what's your current feeling on this one?

from dqlite.

freeekanayaka avatar freeekanayaka commented on May 25, 2024

@freeekanayaka what's your current feeling on this one?

It's what I said in the last comment. If you get a SQLITE_BUSY error the only thing the client can do is retrying, there's no chance of queuing up clients.

from dqlite.

stgraber avatar stgraber commented on May 25, 2024

Okay, so we should close it here and it needs to be handled at a higher level where a retry can happen.

from dqlite.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.