chainsafe / chaindb Goto Github PK

View Code? Open in Web Editor NEW

20.0 9.0 5.0 80 KB

go key-value store using badgerDB

License: GNU Lesser General Public License v3.0

Go 100.00%

chaindb's Introduction

chaindb

go key-value store using badgerDB

usage

go get github.com/ChainSafe/chaindb

chaindb's People

Contributors

Stargazers

Watchers

Forkers

pshail fgimenez isabella232 pinkdiamond1

chaindb's Issues

badger's batch writer should use upstream write batches

In a codebase of mine, I saw huge performance wins by directly using https://pkg.go.dev/github.com/dgraph-io/badger/v2?tab=doc#DB.NewWriteBatch, instead of the current mechanism of first storing the operations in a map[string][]byte.

We do use a WriteBatch when flushing via the Write method, but making every operation go through a map first is IMO missing the point of using a write batch in the first place.

For example, it seems like badger's write batching flushes the writes to the database in large chunks. That is, writing a million values in a batch shouldn't mean keeping a million key-values in memory, as it might be flushing them in chunks of hundreds or thousands.

However, our current implementation always keeps the entire batch in memory, so it doesn't work well at all for large numbers of writes. I think that's the main purpose of batching writes in the first place, so I think we should remove the map entirely.

One feature we will lose because of this is the Reset method, which empties the batch. We can't implement this with "pure" write batches, because once a Put/Del has been done, it might already have been flushed to disk. I don't think we should support this kind of "undo" semantics in write batches.

If gossamer really needs that feature, I think you could continue using write batches, but keep the "reset" logic in your own code. That is, only use Database.NewBatch once the entire final set of writes has been computed. It has the same disadvantage about the memory usage as now, but it keeps write batches in this package fast, so it doesn't force the added complexity and slow-down for everyone else.

add Seek to Iterator interface

see https://pkg.go.dev/github.com/dgraph-io/badger#Iterator.Seek
would be useful for seeking specific prefixes

split implementations to one per package

Right now this module is a single package, which is OK since we only have one implementation with badger.

However, it would be better long-term design to have the interface and generic code/tests at the root package, and one sub-package for each implementation that pulls in heavy dependencies like badger.

If we don't do that, we could easily end up in the case where trying to use badger also forces importing (and thus linking into the binary) other database software.

The first suggestion that comes to mind is chaindb/<dbname>, like chaindb/badger. However, this would be unfortunate as it clashes with the name of the upstream badger package itself, and it's entirely reasonable to want to use both at the same time (e.g. when using our package along with database options from upstream).

Another idea is to use slightly different names, like chaindb/badgerdb. Though that doesn't really avoid the confusion between the two names.

The real difference here is that our package is an implementation, or a wrapper around upstream. Perhaps chaindb/badgerimpl? It's a bit ugly, but I think it's the least confusing.

Another option is to just do chaindb/badger, and then let the importer rename it as they please like import badgerimpl "github.com/ChainSafe/chaindb/badger". Perhaps this is the simplest option.

replace memoryDB implementation with badger's memory database

BadgerDB can be operated in an In-Memory mode using the WithInMemory option opt := badger.DefaultOptions("").WithInMemory(true). Using this functionality would simplify the code base eliminating the need for the existing memoryDB.go.

clarify API concurrency guarantees

I think the interface should do a better job at clarifying what the semantics of each method are, and what are the concurrency guarantees.

In terms of concurrency, I think we could start with some basics:

All methods are thread safe by default. That is, they shouldn't panic when called from many goroutines with -race.
Any method that isn't thread safe with any other method should be clearly documented as such. For example, is Close OK to call concurrently with Put?
All methods that modify the database are atomic. That is, running two concurrent Put calls can only result in either of them being applied in the end, and not corrupted data, or the original value.
While methods like Get and Put are thread safe, data consistency outside of atomic operations is not guaranteed. That is, if Get("foo") is run concurrently with Put("foo", "bar"), one can't predict the result of the Get.

stop compressing all keys and values with snappy

The idea sounds fine to me, but I think it's wrong to hard-code this into the badger implementation.

Instead, I think we should allow the badger constructor to pass options to upstream. For example, https://pkg.go.dev/github.com/dgraph-io/badger/v2?tab=doc#Options.WithCompression could be used to accomplish the same, and probably in a more performant way, since it's handled by upstream.

In the future, if we're also interested in compression for DBs that don't implement it themselves, we could also add a DB overlay similar to the "with prefix" one we already have. In any case, it should not be coupled with any of the implementations.

chainsafe / chaindb Goto Github PK

chaindb's Introduction

chaindb

usage

chaindb's People

Contributors

Stargazers

Watchers

Forkers

chaindb's Issues

badger's batch writer should use upstream write batches

add Seek to Iterator interface

split implementations to one per package

replace memoryDB implementation with badger's memory database

clarify API concurrency guarantees

stop compressing all keys and values with snappy

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent