openacid / slim Goto Github PK

View Code? Open in Web Editor NEW

1.9K 39.0 105.0 18.51 MB

Surprisingly space efficient trie in Golang(11 bits/key; 100 ns/get).

Home Page: https://openacid.github.io/

License: MIT License

Go 86.59% Shell 8.41% Makefile 1.20% Python 2.42% Jinja 1.39%

go golang memory compacted compress datastructure trie tree

slim's Introduction

Slim - surprisingly space efficient data types in Golang

Slim is collection of surprisingly space efficient data types, with corresponding serialization APIs to persisting them on-disk or for transport.

Why slim
Performance and memory overhead
Synopsis
Filter mode and KV mode.
Try it
- Versions
Who are using slim
Feedback and contributions
Authors
License

Why slim

As data on internet keeps increasing exponentially, the capacity gap between memory and disk becomes greater.

Most of the time, a data itself does not need to be loaded into expensive main memory. Only the much more important information, WHERE-A-DATA-IS, deserve a seat in main memory.

This is what slim does, keeps as little information as possible in main memory, as a minimized index of huge amount external data.

SlimIndex: is a common index structure, building on top of SlimTrie.
SlimTrie is the underlying index data structure, evolved from trie.

Features:
- Minimized: 11 bits per key(far less than an 64-bits pointer!!).
- Stable: memory consumption is stable in various scenarios. The Worst case converges to average consumption tightly. See benchmark.
- Loooong keys: You can have VERY long keys(16K bytes), without any waste of memory(and money). Do not waste your life writing another prefix compression:). (aws-s3 limits key length to 1024 bytes). Memory consumption only relates to key count, not to key length.
- Ordered: like btree, keys are stored. Range-scan will be ready in 0.6.0.
- Fast: ~150 ns per Get(). Time complexity for a get is O(log(n) + k); n: key count; k: key length.
- Ready for transport: a single proto.Marshal() is all it requires to serialize, transport or persisting on disk etc.

Performance and memory overhead

3.3 times faster than the btree.
2.3 times faster than binary search.

Memory overhead is about 11 bit per key.

The data struct in this benchmark is a slice of key-value pairs with a SlimTrie serving as the index. The slim itself is built in the filter mode, to maximize memory reduction and performance. The whole struct slimKV is a fully functional kv-store, just like a static btree.

type slimKV struct {
    slim *trie.SlimTrie
    Elts []*KVElt
}
type KVElt struct {
    Key string
    Val int32
}

You can find the benchmark code in benchmark;

Synopsis

1. Index on-disk key-values

One of the typical usages of slim is to index serialized data on disk(e.g., key value records in a SSTable). By keeping a slim in memory, one can quickly find the on-disk offset of the record by a key.

Show me the code ......

package index_test

import (
	"fmt"
	"strings"

	"github.com/openacid/slim/index"
)

type Data string

func (d Data) Read(offset int64, key string) (string, bool) {
	kv := strings.Split(string(d)[offset:], ",")[0:2]
	if kv[0] == key {
		return kv[1], true
	}
	return "", false
}

func Example() {

	// Accelerate external data accessing (in memory or on disk) by indexing
	// them with a SlimTrie:

	// `data` is a sample of some unindexed data. In our example it is a comma
	// separated key value series.
	//
	// In order to let SlimTrie be able to read data, `data` should have
	// a `Read` method:
	//     Read(offset int64, key string) (string, bool)
	data := Data("Aaron,1,Agatha,1,Al,2,Albert,3,Alexander,5,Alison,8")

	// keyOffsets is a prebuilt index that stores key and its offset in data accordingly.
	keyOffsets := []index.OffsetIndexItem{
		{Key: "Aaron", Offset: 0},
		{Key: "Agatha", Offset: 8},
		{Key: "Al", Offset: 17},
		{Key: "Albert", Offset: 22},
		{Key: "Alexander", Offset: 31},
		{Key: "Alison", Offset: 43},
	}

	// `SlimIndex` is simply a container of SlimTrie and its data.
	st, err := index.NewSlimIndex(keyOffsets, data)
	if err != nil {
		fmt.Println(err)
	}

	// Lookup
	v, found := st.Get("Alison")
	fmt.Printf("key: %q\n  found: %t\n  value: %q\n", "Alison", found, v)

	v, found = st.Get("foo")
	fmt.Printf("key: %q\n  found: %t\n  value: %q\n", "foo", found, v)

	// Output:
	// key: "Alison"
	//   found: true
	//   value: "8"
	// key: "foo"
	//   found: false
	//   value: ""
}

2. Sparse index

Create an index item for every 4(or more as you wish) keys.

Let several adjacent keys share one index item reduces a lot memory cost if there are huge amount keys in external data. Such as to index billions of 4KB objects on a 4TB disk(because one disk IO costs 20ms for either reading 4KB or reading 1MB).

Show me the code ......

package index_test

import (
	"fmt"
	"strings"

	"github.com/openacid/slim/index"
)

type RangeData string

func (d RangeData) Read(offset int64, key string) (string, bool) {
	for i := 0; i < 4; i++ {
		if int(offset) >= len(d) {
			break
		}

		kv := strings.Split(string(d)[offset:], ",")[0:2]
		if kv[0] == key {
			return kv[1], true
		}
		offset += int64(len(kv[0]) + len(kv[1]) + 2)

	}
	return "", false
}

func Example_indexRanges() {

	// Index ranges instead of keys:
	// In this example at most 4 keys shares one index item.

	data := RangeData("Aaron,1,Agatha,1,Al,2,Albert,3,Alexander,5,Alison,8")

	// keyOffsets is a prebuilt index that stores range start, range end and its offset.
	keyOffsets := []index.OffsetIndexItem{
		// Aaron  +--> 0
		// Agatha |
		// Al     |
		// Albert |

		// Alexander +--> 31
		// Alison    |

		{Key: "Aaron", Offset: 0},
		{Key: "Agatha", Offset: 0},
		{Key: "Al", Offset: 0},
		{Key: "Albert", Offset: 0},

		{Key: "Alexander", Offset: 31},
		{Key: "Alison", Offset: 31},
	}

	st, err := index.NewSlimIndex(keyOffsets, data)
	if err != nil {
		panic(err)
	}

	v, found := st.RangeGet("Aaron")
	fmt.Printf("key: %q\n  found: %t\n  value: %q\n", "Aaron", found, v)

	v, found = st.RangeGet("Al")
	fmt.Printf("key: %q\n  found: %t\n  value: %q\n", "Al", found, v)

	v, found = st.RangeGet("foo")
	fmt.Printf("key: %q\n  found: %t\n  value: %q\n", "foo", found, v)

	// Output:
	// key: "Aaron"
	//   found: true
	//   value: "1"
	// key: "Al"
	//   found: true
	//   value: "2"
	// key: "foo"
	//   found: false
	//   value: ""
}

3. Range scan

Slim can also be used as a traditional in-memory kv-store: Building a slim with Opt{ Complete: Bool(true) }, it won't strip out any information(e.g., it won't eliminate single-branch labels) and it will functions the same as a btree. This snippet shows how to iterate key values.

Show me the code ......

package trie

import (
	"fmt"

	"github.com/openacid/slim/encode"
)

func ExampleSlimTrie_ScanFrom() {
	var keys = []string{
		"",
		"`",
		"a",
		"ab",
		"abc",
		"abca",
		"abcd",
		"abcd1",
		"abce",
		"be",
		"c",
		"cde0",
		"d",
	}
	values := makeI32s(len(keys))

	codec := encode.I32{}
	st, _ := NewSlimTrie(codec, keys, values, Opt{
		Complete: Bool(true),
	})

	// untilD stops when encountering "d".
	untilD := func(k, v []byte) bool {
		if string(k) == "d" {
			return false
		}

		_, i32 := codec.Decode(v)
		fmt.Println(string(k), i32)
		return true
	}

	fmt.Println("scan (ab, +∞):")
	st.ScanFrom("ab", false, true, untilD)

	fmt.Println()
	fmt.Println("scan [be, +∞):")
	st.ScanFrom("be", true, true, untilD)

	fmt.Println()
	fmt.Println("scan (ab, be):")
	st.ScanFromTo(
		"ab", false,
		"be", false,
		true, untilD)

	// Output:
	//
	// scan (ab, +∞):
	// abc 4
	// abca 5
	// abcd 6
	// abcd1 7
	// abce 8
	// be 9
	// c 10
	// cde0 11
	//
	// scan [be, +∞):
	// be 9
	// c 10
	// cde0 11
	//
	// scan (ab, be):
	// abc 4
	// abca 5
	// abcd 6
	// abcd1 7
	// abce 8
}

Filter mode and KV mode.

Slim can be built into either a filter(like bloom filter but with key order preserved.) or a real kv-store(like btree) There is an option in NewSlimTrie(..., option) to control the building behavior. Ref: Opt

To use slim as a kv-store, set the option to Complete then there won't be false positives.

To use it as a filter, set InnerPrefix, LeafPrefix to false(Complete implies InnerPrefix==true and LeafPrefix==true). Then slim won't store any single branch label in the trie it builds.

With InnerPrefix==true, it does not reduce a single label branch that leads to an inner node.

With LeafPrefix==true, it does not reduce a single label branch that leads to a leaf node.

E.g.:

// Complete
InnerPrefix: true
LeafPrefix: true
^ -a-> 1 -b-> $
 `-c-> 2 -x-> 3 -y-> $
               `-z-> $

InnerPrefix: true
LeafPrefix: false
^ -a-> $
 `-c-> 2 -x-> 3 -y-> $
               `-z-> $

InnerPrefix: false
LeafPrefix: true
^ -a-> 1 -b-> $
 `-c-> 3 -y-> $
        `-z-> $

InnerPrefix: false
LeafPrefix: false
^ -a-> $
 `-c-> 3 -y-> $
        `-z-> $

The memory consumption in filter mode and kv mode differs significantly. The following chart shows memory consumption by 1 million var-length string, 10 to 20 byte in different mode:

-	size	gzip-size
sample data size	15.0M	14.0M
Complete:true	14.0M	10.0M
InnerPrefix:ture	1.3M	0.9M
all false	1.3M	0.8M

Try it

Install

go get github.com/openacid/slim/trie

Change-log: Change-log

Versions

A newer version y being compatible with an older version x means y can load data serialized by x. But x should never try to load data serialized by a newer version y.

v0.5.* is compatible with 0.2.*, 0.3.*, 0.4.*, 0.5.*.
v0.4.* is compatible with 0.2.*, 0.3.*, 0.4.*.
v0.3.* is compatible with 0.2.*, 0.3.*.
v0.2.* is compatible with 0.2.*.

Who are using slim

baishancloud

Feedback and contributions

Feedback and Contributions are greatly appreciated.

At this stage, the maintainers are most interested in feedback centered on:

Do you have a real life scenario that slim supports well, or doesn't support at all?
Do any of the APIs fulfill your needs well?

Let us know by filing an issue, describing what you did or wanted to do, what you expected to happen, and what actually happened:

Or other type of issue.

Authors

刘保海 marshaling
吴义谱 array
张炎泼 slimtrie design
李文博 trie-compressing, trie-search
李树龙 marshaling

See also the list of contributors who participated in this project.

License

This project is licensed under the MIT License - see the LICENSE file for details.

slim's People

Contributors

Stargazers

Watchers

Forkers

wenbobuaa spongedu nanne007 ngaut abangdd richardliao isgasho jzoe xuexila liukun4515 flavorzyb zhanglei 5up3rc amoshappy abitduck sckelemen influx6 gatarelib cheneydeng zhangyi100 htlszlh gaoyingie ariesdevil junxie6 crixalis2013 b-xiang feng-y martinroad crain-cn koalacxr leonliu315 hhy5277 mahongguang shaunstanislauslau wangxujun163163 bigbigwei 5ace boniface ksharpdabu pythonpeixun wfxiang08 hydrophis jokerzh ceh-forks campeloguedes keyboard1989 lwllvyb a-lucas feng2012 daniel-007 junftnt kokizzu chaitanyaphalak m00zh33 lujieha wanhuo batermj forkkit jimchenglin tangzhe7 jackkakaya miss-bug digidis holajiawei undertreetech amasser glycerine lideen999 xielongdragon eliasyaoyc zhaoshiling1017 gomodules changtailiang dolanor-galaxy go-for-development dumpmemory fadhilkurnia wnz27 silvaq andypeng2015 baajarmeh trendingtechnology sejust macro-huge realm2go pkafma-aon junli1026 imfantuan gitsrc fatelei distributed-system-lab altoplano liyoujunlizi didip gismo-stack eamon-cai honshow iq-scm ethananro hohice

slim's Issues

Range support

Range support

A range SlimTrie is same as a key-value SlimTrie internally.
The only difference is a range SlimTrie uses two keys(starting and ending key) to represent a range.

// Create a range SlimTrie with 3 range:
// ["a", "p"] --> 1
// ["q", "q"] --> 5 // a range with only one item.
// ["r", "z"] --> 3
st, err := NewSlimTrie(xxx, []string{"a", "p", "q", "r" "z"}, []int32{1,1, 5, 3, 3})

Creating a SlimTrie with range support is the same as creating a normal SlimTrie, except that two adjacent keys(start, end for a range) have the same value.
Add example_range_test.go to illustrate how to create/use a range SlimTrie.
Rewrite tests for the new range support. Also the max number of keys in a range SlimTrie can be determined before creating it.
Add trie/README.md explaining how range SlimTrie works.
trie.Node.Append(): remove argument isStartLeaf and attribute Node.isStartLeaf, which is not required anymore according to latest design.

Push to branch "release-0.3"

XXX_sizecache undefined

Describe the bug

running make test outputted errors below

# github.com/openacid/slim/array_test [github.com/openacid/slim/array.test]
array/int_test.go:196:4: a.XXX_sizecache undefined (type *array.U16 has no field or method XXX_sizecache)
array/int_test.go:235:3: a.XXX_sizecache undefined (type *array.U16 has no field or method XXX_sizecache)
array/int_test.go:428:4: a.XXX_sizecache undefined (type *array.U32 has no field or method XXX_sizecache)
array/int_test.go:467:3: a.XXX_sizecache undefined (type *array.U32 has no field or method XXX_sizecache)
array/int_test.go:660:4: a.XXX_sizecache undefined (type *array.U64 has no field or method XXX_sizecache)
array/int_test.go:699:3: a.XXX_sizecache undefined (type *array.U64 has no field or method XXX_sizecache)
array/marshal_test.go:73:4: a.XXX_sizecache undefined (type *array.U16 has no field or method XXX_sizecache)
array/marshal_test.go:112:3: a.XXX_sizecache undefined (type *array.U16 has no field or method XXX_sizecache)

To Reproduce

go get github.com/openacid/slim
make gen
mak test
See error

Environment (please complete the following information):

OS: mac
Version 10.14

Language (please complete the following information):

Language: Go
Version: 1.12

serialize/: use interface ReaderAt/WriterAt to replace io.File

related functions:
MarshalAt
UnMarshalAt

Imported serialize is contained in branch "serialize".
And its merge target should be "serialize", which contains unmodified imported codes from ec.

serialize: potential issue of valid read byte count

n += cnt should be placed after error checking?

func readFull(reader io.Reader, buf []byte) (cnt int, err error) {
	n, cnt, toRead := 0, 0, len(buf)

	for n < toRead {
		cnt, err = reader.Read(buf[n:])
		n += cnt

		if err != nil {
			break
		}
	}

	return n, err
}

import version

simplify Search() and neighborBranches()

Describe what to do

Let st.neighborBranches not deal with leaf branch, it should be dealt with outside this function.
This function should only deal with non-leaf branch.
Thus the searching logic would be more clear.

The code shows that: if word == LeafWord then:

Return value ltIdx is always -1
Return value eqIdx is always eqIdx
Return value rtIdx is the first child of current node, or -1(if no child).
Return value ltLeaf(indicate if ltIdx is a leaf) is always false

Thus moving logic about leaf out of this function seems better and clear.


func (st *SlimTrie) neighborBranches(idx uint16, word byte) (ltIdx, eqIdx, rtIdx int32, ltLeaf bool) {
	ltIdx, eqIdx, rtIdx = int32(-1), int32(-1), int32(-1)
	ltLeaf = false

	isLeaf := st.Leaves.Has(int32(idx))

	if word == LeafWord {
		if isLeaf {
			eqIdx = int32(idx)
		}
	} else {
		if isLeaf {
			ltIdx = int32(idx)
			ltLeaf = true
		}
	}

	bm, of, found := st.getChild(idx)
	if !found {
		return
	}

	if (bm >> word & 1) == 1 {
		eqIdx = int32(getChildIdx(bm, of, uint16(word)))
	}

	ltStart := word & WordMask
	for i := int8(ltStart) - 1; i >= 0; i-- {
		if (bm >> uint8(i) & 1) == 1 {
			ltIdx = int32(getChildIdx(bm, of, uint16(i)))
			ltLeaf = false
			break
		}
	}

	rtStart := word + 1
	if word == LeafWord {
		rtStart = uint8(0)
	}

	for i := rtStart; i < LeafWord; i++ {
		if (bm >> i & 1) == 1 {
			rtIdx = int32(getChildIdx(bm, of, uint16(i)))
			break
		}
	}

	return
}

README: add description about protobuf data structure, usage, how to update etc.

array/ try to optimize

pre-store popcount for every 8-bit word.
use a 256-elts array to store popcount.
use static mask.

Add online document

Describe the solution

In slim there are several sub package.
Add multiple links to their url at godoc.com

Weekly Digest (10 February, 2020 - 17 February, 2020)

Here's the Weekly Digest for openacid/slim:

ISSUES

Last week, no issues were created.

PULL REQUESTS

Last week, no pull requests were created, updated or merged.

COMMITS

Last week there were no commits.

CONTRIBUTORS

Last week there were no contributors.

STARGAZERS

Last week there was 1 stargazer.
⭐ x0rzkov
You are the star! 🌟

RELEASES

Last week there were no releases.

That's all for last week, please 👀 Watch and ⭐ Star the repository openacid/slim to receive next weekly updates. 😃

You can also view all Weekly Digests by clicking here.

Your Weekly Digest bot. 📆

Trie Node: add field `squash`

Trie Node: add field squash

Currently the (not slim) Trie node is defined as follow:

type Node struct {
	Children map[int]*Node
	Branches []int
	Step     uint16
	Value    interface{}

	isStartLeaf bool

	NodeCnt int
}

Add a field squash bool, to indicate this trie to squash preceding branches every time after Append a new key.

When creating child node, the child should inherit the squash value from its parent.

Benifit:

Thus this simplifies the method Append(): user does not need to pass a squash argument any more.

add travis auto test

add config
add the "passed" icon into README

readme: add personal contribution in section "Authors" at the bottom

error struct proposal

定义一个描述错误的数据格式, 用来实现golang在运行时的错误传递, 目标:

设计数据结构的前提假设:
- 模块的用户, 90% 只判断有没有err, 不判断err是什么.
- 每个模块定义自己本层的错误, 但提供接口允许用户在需要时拿到最底层的错误(例如errno等).
- 避免直接向上传递错误(上层模块拿到底层错误用处不大, 例如创建block时只得到一个permission deny的errno, 无法有效的定位到到底是哪个文件/目录出现了问题).
- error的设计目标是为了简化查错, 提供完整的日志, 方便统计分析, 出错时的整条调用链的检查.
可以用于API层的错误描述, 一般API处理时收到错误并需要将错误返回给客户端, 要求错误有:
- 具体唯一确定的error code, 便于客户端判断
- 人类可读的error message.
- 发生错误的相关的东西是什么.
Message通过Code和Resource生成出来. 只提供一个Message接口用来输出message.
希望这个模块可以作为go的errors包的无修改替代品, 和 https://github.com/pkg/errors 的无修改替代品.
提供一个方便的接口让用户直接取得最底层的错误.
记录引发错误的底层错误是什么. 类似在存储服务中, 一个底层的IO错误导致API失败, 如果能在日志中记录引发错误的错误, 可以方便定位问题.

有一类上层错误由几个下层错误引起, 例如多数派写中, nw = 5,3, 这时写失败3个后端会引起整个API调用失败, 这时需要记录多个下层错误.
error 结构里可选的带有stacktrace 信息,方便打印日志(参考了 https://github.com/pkg/errors )

type Error struct {
  Code       string  // error code
  Cause    []error   // 0 or seveal error that cause this error.
  Resource   string  // optional
  *stack             // optional traceback info of this error
}


// 实现系统error interface
func (e *Error) Error() string // 直接返回Code的string

// 兼容 https://github.com/pkg/errors 的接口:
func (e *Error) Cause() string // 返回第一个cause
// 其他接口没差别不列出了


// 扩展的接口
func (e *Error) AllCause() string // 返回所有cause的slice
func (e *Error) RootCause() string // 返回最初的cause
func (e *Error) AllRootCause() string // 返回所有的最底层的cause; cause的树的叶子节点.
func (e *Error) Message() string  // 给人看的, 通过Code和Resource拼装起来.

例子🌰: s2, group not found的一个可能的错误信息

以s2中场景为例, 描述下如何表示一个具体的错误,
假设一个错误是group没读取到.
而引发这个错误的是通过dbproxy读取group信息失败引发的.
而读dbproxy时重试了2次都失败了, 一次是mysql被置位readonly, 一次是socket读取超时:

Code: GroupNotFound
Resource: "group_id://123"
Cause: 
    - Code: DBProxyReadError
      Resource: "dbproxy://127.0.0.1:3303"
      Cause:
        - Code: MysqlReadonly
          Resource: "mysql://192.168.2.2:3306"
        - Code: InvalidResponse
          Resource: "mysql://192.168.9.9:3306"
          Cause:
              - Code: SocketReadTimeout
                Resource: "tcp4://192.168.9.9:3306"

例子🌰: ec, block build error的一个可能的错误信息

Code: BlockBuildError
Resource: "block://aabbccddeeff"
Cause:
    - Code: NeedleWriteError
      Resource: "needle://3cp/foo/bar.jpg"
      Cause:
          - Code: FSIsReadonly
            Resource: "file:///ecdrives/bbb" # schema of local file url in browser
            Cause:
                - <a native fs error> # may be an error with errno.

Standardise compact array API

compacted array provides 3 method: Init, Get and Has;

It'd be better if compacted array adopts existent popular map-like datatype API.

One to the uncomfortable spot is that Get() returns nil when not found.
But golang map returns a second bool value ok to indicate if it is found.

Describe the solution you'd like

Find a popular key-value implementation. Adopt its API.

add .github settings from repo gh-config

version/: add version comparison and backward compatibility guarantee functions

array/: move report generator output bench. make it a standalone program.

Testing imports leak into trie package

Looking at the imports of the github.com/openacid/slim/trie package, I see multiple packages that appear to be used for testing. Among them are testing, unsafe, github.com/openacid/slim/trie/benchmark, github.com/google/btree and more. This appears to be because of the file trie/test_utils.go. If this file were called trie/utils_test.go these imports disappear.

To Reproduce

https://godoc.org/github.com/openacid/slim/trie?imports

Expected behavior
I expected a list of imports without testing

Actual behavior
Includes testing and a few others that probably don't belong there.

I think this can be fixed by renaming the file, but I'm not 100% sure, because I don't know the code too well.

add `trie/docs/squash.md` to explain how "squash" works

any plan for c++ version?

create a dir for all kinds of bench/profiling report/charts

bit: rename to bits; and should respect golang `math/bits` APIs

Make package bit a drop-in replacer of math/bits.

Describe what to do
bit: rename to bits; and should respect golang math/bits APIs

Add a migration guide to help migrate ec to use this repo

What document to add

Additional context

把slimtrie 导入进来

几个相关的数据结构:

slim.baseTrie 普通的trie
slim.SlimTrie 压缩后的16 branch的trie
slim.CompactedArray 压缩数组
slim.bit bit操作函数包

各位看看下, 名字ok不?

Need memory usage comparison with standard map

so people know what's benefit this library provide compared to map other than ordered data.

import compacted-array

增加doc
修改名字
调整引用的deps的路径(之前跟xec相关)

Add code coverage test and badge

Range-scan is supported!(under development)

Range-scan is supported!(under development)
When to release this feature?
This feature will be very useful！

introduce iohelper; extract common io function out of `serialize`

serialize.readFull -> io.ReadFull

Use io.SectionReader to unify Unmarshal and UnmarshalAt

serialize: add doc

Imported serialize is contained in branch "serialize".
And its merge target should be "serialize-base", which contains unmodified imported codes from ec.

serialize: fix bug: direct string type version compare

Version comparison should be implemented according to version semantics, not with plain string comparison.

func makeDataHeader(verStr string, headerSize uint64, dataSize uint64) *DataHeader {
        ...
	if verStr > version.VERSION { // <-- Bug
		panic("forward compatibility is not supported")
	}

A great practical version semantics package is here: https://github.com/Masterminds/semver

semantic version:

https://semver.org/

Given a version number MAJOR.MINOR.PATCH, increment the:
MAJOR version when you make incompatible API changes,
MINOR version when you add functionality in a backwards-compatible manner, and
PATCH version when you make backwards-compatible bug fixes.
Additional labels for pre-release and build metadata are available as extensions to the > MAJOR.MINOR.PATCH format.

marshaling support for SlimTrie

Support proto.Marshal(SlimTrie)

SlimTrie should implement two interface: proto.Marshaler and proto.Unmarhsaler.

type Marshaler interface {
	Marshal() ([]byte, error)
}
type Unmarshaler interface {
	Unmarshal([]byte) error
}

To let the upper level apps easily marshal/unmarshal a SlimTrie, just like operating on a standard proto.Message.

To utilize protobuf functionalities.

API changes:

Currently SlimTrie provides Marshal and Unmarshal but the return value are different: func (st *SlimTrie) marshal(writer io.Writer) (cnt int64, err error)

On-disk data structure changes:

Currently on disk format is:

Children-header
Children-array
Steps-header
Steps-array
Leaves-header
Leaves-array

The header is written by package serialize.

After adapting to proto.Marshaler, the header is not stored any more:

Children-array
Steps-array
Leaves-array

It should be responsibility of upper level to add the header.
E.g. by calling serialize.Marshal(slimtrie) instead of calling slimtrie.Marshal.

serialize.Marshal writes a header(consists of version and length of the marshaled object), then finaly the on-disk layout should be:

SlimTrie-header
Children-array
Steps-array
Leaves-array

remove searchWords

add "contribution.md"

To describe:

Weekly Digest (19 April, 2020 - 26 April, 2020)

Here's the Weekly Digest for openacid/slim:

ISSUES

Last week, no issues were created.

PULL REQUESTS

Last week, no pull requests were created, updated or merged.

COMMITS

Last week there were no commits.

CONTRIBUTORS

Last week there were no contributors.

STARGAZERS

Last week there were 4 stagazers.
⭐ TheBits
⭐ gnomeria
⭐ eeelinc
⭐ vislee
You all are the stars! 🌟

RELEASES

Last week there were no releases.

That's all for last week, please 👀 Watch and ⭐ Star the repository openacid/slim to receive next weekly updates. 😃

You can also view all Weekly Digests by clicking here.

Your Weekly Digest bot. 📆

Create a better README

There is a great sample of README:
https://gist.github.com/PurpleBooth/109311bb0361f32d87a2

Trie are missing keys

I created a trie with ~70k unique keys and corresponding integer values with trie.NewSlimTrie(encode.Int{}, keys, values)
I then measured how many of my keys were in the trie. I was lacking roughly 14%

To Reproduce
This doesn't return the same number of keys, but shows the problem:

func TestSlimTrie(t *testing.T) {
	var keys []string
	var indexes []int
	for i := 0; i < 70000; i++ {
		keys = append(keys, randStringRunes(10))
		indexes = append(indexes, i)
	}
	sort.Strings(keys)
	tr, err := trie.NewSlimTrie(encode.Int{}, keys, indexes)
	if err != nil {
		t.Error(err)
		return
	}
	count := 0
	for _, key := range keys {
		_, found := tr.Get(key)
		if !found {
			count++
		}
	}
	if count > 0 {
		t.Errorf("Couldn't find %f%% of the keys entered. Missing %d keys", float64(count)/float64(len(keys))*100, count)
	}
}

Environment:

MacOS
openacid/slim v0.5.5

Language:

go 1.12

Weekly Digest (16 February, 2020 - 23 February, 2020)

Here's the Weekly Digest for openacid/slim:

ISSUES

Last week 3 issues were created.
Of these, 1 issues have been closed and 2 issues are still open.

OPEN ISSUES

💚 #116 build(deps): bump github.com/stretchr/testify from 1.3.0 to 1.5.1, by dependabot-preview[bot]
💚 #115 [Snyk] Security upgrade pyyaml from 5.1 to 5.2, by snyk-bot

CLOSED ISSUES

❤️ #114 build(deps): bump github.com/stretchr/testify from 1.3.0 to 1.5.0, by dependabot-preview[bot]

NOISY ISSUE

🔈 #114 build(deps): bump github.com/stretchr/testify from 1.3.0 to 1.5.0, by dependabot-preview[bot]
It received 2 comments.

PULL REQUESTS

Last week, 2 pull requests were created, updated or merged.

UPDATED PULL REQUEST

Last week, 2 pull requests were updated.
💛 #116 build(deps): bump github.com/stretchr/testify from 1.3.0 to 1.5.1, by dependabot-preview[bot]
💛 #115 [Snyk] Security upgrade pyyaml from 5.1 to 5.2, by snyk-bot

COMMITS

Last week there were no commits.

CONTRIBUTORS

Last week there were no contributors.

STARGAZERS

Last week there were 5 stagazers.
⭐ Demonstrandum
⭐ vrischmann
⭐ niciyan
⭐ an63
⭐ gvelo
You all are the stars! 🌟

RELEASES

Last week there were no releases.

That's all for last week, please 👀 Watch and ⭐ Star the repository openacid/slim to receive next weekly updates. 😃

You can also view all Weekly Digests by clicking here.

Your Weekly Digest bot. 📆

refine error definition

Create a struct that implements interface error instead of wrapping struct error.

pkg/errors seems like a good choice:
https://github.com/pkg/errors

array/: add doc

go doc guide:

https://blog.golang.org/godoc-documenting-go-code

Document package, types, public function/method/variables.

Find a better test framework

go native if x != y { t.Fatalf... is ugly.
Need a new one with more human readable syntax.

Is the requested enhancement related to a problem? Please describe.

Describe the solution

Affect other component or side effect

It's better if a new test framework compatible with go native testing.
Otherwise tests need to be rewritten.

readme: add synopsis

Synopsis should show how to create and query a slimtrie

serialize.bytesToString do not need to return an err

func bytesToString(buf []byte, delimter byte) (str string, err error) {
	delimPos := bytes.IndexByte(buf, delimter)
	if delimPos == -1 {
		delimPos = len(buf)
	}

	return string(buf[:delimPos]), err
}

Weekly Digest (12 April, 2020 - 19 April, 2020)

Here's the Weekly Digest for openacid/slim:

ISSUES

Last week 1 issue was created.
It is still open.

OPEN ISSUES

💚 #122 build(deps): bump github.com/golang/protobuf from 1.3.1 to 1.4.0, by dependabot-preview[bot]

PULL REQUESTS

Last week, 1 pull request was created, updated or merged.

UPDATED PULL REQUEST

Last week, 1 pull request was updated.
💛 #122 build(deps): bump github.com/golang/protobuf from 1.3.1 to 1.4.0, by dependabot-preview[bot]

COMMITS

Last week there were no commits.

CONTRIBUTORS

Last week there were no contributors.

STARGAZERS

Last week there was 1 stargazer.
⭐ shijiayun
You are the star! 🌟

RELEASES

Last week there were no releases.

That's all for last week, please 👀 Watch and ⭐ Star the repository openacid/slim to receive next weekly updates. 😃

You can also view all Weekly Digests by clicking here.

Your Weekly Digest bot. 📆

Weekly Digest (5 April, 2020 - 12 April, 2020)

Here's the Weekly Digest for openacid/slim:

ISSUES

This week, no issues have been created or closed.

PULL REQUESTS

This week, no pull requests has been proposed by the users.

CONTRIBUTORS

This week, no user has contributed to this repository.

STARGAZERS

This week, no user has starred this repository.

COMMITS

This week, there have been no commits.

RELEASES

This week, no releases were published.

That's all for this week, please watch 👀 and star ⭐ openacid/slim to receive next weekly updates. 😃

trie.Trie.ToStrings() need doc, test, exampe etc

Explain the computation of space complexity

What document to add
裁剪之后的SlimTrie：
每个节点都有大于1个子节点, 除LeafParent节点之外。
SlimTrie的大小跟key的长度无关. 也就是说任意长度的key的集合都可以使用有限大小的SlimTrie来索引: SlimTrie最多有2n-1个节点: 最多节点是出现在所有inner节点都是二分的时候（空间复杂度为O(n)）

//why the most node number of SlimTrie is 2n-1. Could you plase add more comments?

Additional context

support multiple versions of impl for one data-type

Add version constrained marshal/unmarshal to array

A data-type such as array would evolve and finally there are several implementation in a package.
A versionizing mechanism should be introduced to:

During serialize, dump version.
During deserialize, check version and de-serialize data to correct data-type.

TODO

Introduce version API to let data-types implement it and provide its version.

type Version string
type VersionGetter interface {
   func GetVer() Version
}

Introduce Marshaller interface to let data-types provides versioned marshal API.
Marshaller.Marshal() should call serialize.xxx to dump version.
And also Marshaller.Unmarshal() should check version and choose a data type with correct version.

type Marshaller interface {
    Marshal(w io.Writer) (written int64, o interface{}, err error)
}

type UnMarshaller interface {
    Unmarshal(r io.Reader) (read int64, o interface{}, err error)
}

type MarshallerAt interface {
    MarshalAt(w io.Writer, offset int64) (written int64, o interface{}, err error)
}

type UnMarshallerAt interface {
    UnmarshalAt(r io.Reader, offset int64) (read int64, o interface{}, err error)
}

Introduce MarshalHelper struct to simplify Marshaling by converting MarshalAt to Marshal(see iohelper).

type MarshalHelper struct {
    Marshaller
    UnMarshaller
}
func (m *MarshalHelper) MarshalAt(offset int64) (written int64, err error)

array should then implement 4 marshal interface.

openacid / slim Goto Github PK

slim's Introduction

Slim - surprisingly space efficient data types in Golang

Why slim

Performance and memory overhead

Synopsis

1. Index on-disk key-values

2. Sparse index

3. Range scan

Filter mode and KV mode.

Try it

Versions

Who are using slim

Feedback and contributions

Authors

License

slim's People

Contributors

Stargazers

Watchers

Forkers

slim's Issues

ISSUES

PULL REQUESTS

COMMITS

CONTRIBUTORS

STARGAZERS

RELEASES

例子🌰: s2, group not found的一个可能的错误信息

例子🌰: ec, block build error的一个可能的错误信息

semantic version:

ISSUES

PULL REQUESTS

COMMITS

CONTRIBUTORS

STARGAZERS

RELEASES

ISSUES

OPEN ISSUES

CLOSED ISSUES

NOISY ISSUE

PULL REQUESTS

UPDATED PULL REQUEST

COMMITS

CONTRIBUTORS

STARGAZERS

RELEASES

ISSUES

OPEN ISSUES

PULL REQUESTS

UPDATED PULL REQUEST

COMMITS

CONTRIBUTORS

STARGAZERS

RELEASES

ISSUES

PULL REQUESTS

CONTRIBUTORS

STARGAZERS

COMMITS

RELEASES

Add version constrained marshal/unmarshal to array

TODO

Recommend Projects

Recommend Topics

Recommend Org