go key-value store using badgerDB
go get github.com/ChainSafe/chaindb
go key-value store using badgerDB
License: GNU Lesser General Public License v3.0
In a codebase of mine, I saw huge performance wins by directly using https://pkg.go.dev/github.com/dgraph-io/badger/v2?tab=doc#DB.NewWriteBatch, instead of the current mechanism of first storing the operations in a map[string][]byte
.
We do use a WriteBatch when flushing via the Write method, but making every operation go through a map first is IMO missing the point of using a write batch in the first place.
For example, it seems like badger's write batching flushes the writes to the database in large chunks. That is, writing a million values in a batch shouldn't mean keeping a million key-values in memory, as it might be flushing them in chunks of hundreds or thousands.
However, our current implementation always keeps the entire batch in memory, so it doesn't work well at all for large numbers of writes. I think that's the main purpose of batching writes in the first place, so I think we should remove the map entirely.
One feature we will lose because of this is the Reset
method, which empties the batch. We can't implement this with "pure" write batches, because once a Put/Del has been done, it might already have been flushed to disk. I don't think we should support this kind of "undo" semantics in write batches.
If gossamer really needs that feature, I think you could continue using write batches, but keep the "reset" logic in your own code. That is, only use Database.NewBatch once the entire final set of writes has been computed. It has the same disadvantage about the memory usage as now, but it keeps write batches in this package fast, so it doesn't force the added complexity and slow-down for everyone else.
see https://pkg.go.dev/github.com/dgraph-io/badger#Iterator.Seek
would be useful for seeking specific prefixes
Right now this module is a single package, which is OK since we only have one implementation with badger.
However, it would be better long-term design to have the interface and generic code/tests at the root package, and one sub-package for each implementation that pulls in heavy dependencies like badger.
If we don't do that, we could easily end up in the case where trying to use badger also forces importing (and thus linking into the binary) other database software.
The first suggestion that comes to mind is chaindb/<dbname>
, like chaindb/badger
. However, this would be unfortunate as it clashes with the name of the upstream badger package itself, and it's entirely reasonable to want to use both at the same time (e.g. when using our package along with database options from upstream).
Another idea is to use slightly different names, like chaindb/badgerdb
. Though that doesn't really avoid the confusion between the two names.
The real difference here is that our package is an implementation, or a wrapper around upstream. Perhaps chaindb/badgerimpl
? It's a bit ugly, but I think it's the least confusing.
Another option is to just do chaindb/badger
, and then let the importer rename it as they please like import badgerimpl "github.com/ChainSafe/chaindb/badger"
. Perhaps this is the simplest option.
BadgerDB can be operated in an In-Memory mode using the WithInMemory option opt := badger.DefaultOptions("").WithInMemory(true)
. Using this functionality would simplify the code base eliminating the need for the existing memoryDB.go.
I think the interface should do a better job at clarifying what the semantics of each method are, and what are the concurrency guarantees.
In terms of concurrency, I think we could start with some basics:
-race
.Close
OK to call concurrently with Put
?Put
calls can only result in either of them being applied in the end, and not corrupted data, or the original value.Get
and Put
are thread safe, data consistency outside of atomic operations is not guaranteed. That is, if Get("foo")
is run concurrently with Put("foo", "bar")
, one can't predict the result of the Get
.The idea sounds fine to me, but I think it's wrong to hard-code this into the badger implementation.
Instead, I think we should allow the badger constructor to pass options to upstream. For example, https://pkg.go.dev/github.com/dgraph-io/badger/v2?tab=doc#Options.WithCompression could be used to accomplish the same, and probably in a more performant way, since it's handled by upstream.
In the future, if we're also interested in compression for DBs that don't implement it themselves, we could also add a DB overlay similar to the "with prefix" one we already have. In any case, it should not be coupled with any of the implementations.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.