Code Monkey home page Code Monkey logo

slim's Introduction

Slim - surprisingly space efficient data types in Golang

Travis test

Report card Coverage Status

GoDoc PkgGoDev Sourcegraph

Slim is collection of surprisingly space efficient data types, with corresponding serialization APIs to persisting them on-disk or for transport.

Why slim

As data on internet keeps increasing exponentially, the capacity gap between memory and disk becomes greater.

Most of the time, a data itself does not need to be loaded into expensive main memory. Only the much more important information, WHERE-A-DATA-IS, deserve a seat in main memory.

This is what slim does, keeps as little information as possible in main memory, as a minimized index of huge amount external data.

  • SlimIndex: is a common index structure, building on top of SlimTrie.

    GoDoc

  • SlimTrie is the underlying index data structure, evolved from trie.

    GoDoc

    Features:

    • Minimized: 11 bits per key(far less than an 64-bits pointer!!).

    • Stable: memory consumption is stable in various scenarios. The Worst case converges to average consumption tightly. See benchmark.

    • Loooong keys: You can have VERY long keys(16K bytes), without any waste of memory(and money). Do not waste your life writing another prefix compression:). (aws-s3 limits key length to 1024 bytes). Memory consumption only relates to key count, not to key length.

    • Ordered: like btree, keys are stored. Range-scan will be ready in 0.6.0.

    • Fast: ~150 ns per Get(). Time complexity for a get is O(log(n) + k); n: key count; k: key length.

    • Ready for transport: a single proto.Marshal() is all it requires to serialize, transport or persisting on disk etc.

Performance and memory overhead

  • 3.3 times faster than the btree.
  • 2.3 times faster than binary search.

  • Memory overhead is about 11 bit per key.

The data struct in this benchmark is a slice of key-value pairs with a SlimTrie serving as the index. The slim itself is built in the filter mode, to maximize memory reduction and performance. The whole struct slimKV is a fully functional kv-store, just like a static btree.

type slimKV struct {
    slim *trie.SlimTrie
    Elts []*KVElt
}
type KVElt struct {
    Key string
    Val int32
}

You can find the benchmark code in benchmark;

Read more about Performance

Synopsis

1. Index on-disk key-values

One of the typical usages of slim is to index serialized data on disk(e.g., key value records in a SSTable). By keeping a slim in memory, one can quickly find the on-disk offset of the record by a key.

Show me the code ......
package index_test

import (
	"fmt"
	"strings"

	"github.com/openacid/slim/index"
)

type Data string

func (d Data) Read(offset int64, key string) (string, bool) {
	kv := strings.Split(string(d)[offset:], ",")[0:2]
	if kv[0] == key {
		return kv[1], true
	}
	return "", false
}

func Example() {

	// Accelerate external data accessing (in memory or on disk) by indexing
	// them with a SlimTrie:

	// `data` is a sample of some unindexed data. In our example it is a comma
	// separated key value series.
	//
	// In order to let SlimTrie be able to read data, `data` should have
	// a `Read` method:
	//     Read(offset int64, key string) (string, bool)
	data := Data("Aaron,1,Agatha,1,Al,2,Albert,3,Alexander,5,Alison,8")

	// keyOffsets is a prebuilt index that stores key and its offset in data accordingly.
	keyOffsets := []index.OffsetIndexItem{
		{Key: "Aaron", Offset: 0},
		{Key: "Agatha", Offset: 8},
		{Key: "Al", Offset: 17},
		{Key: "Albert", Offset: 22},
		{Key: "Alexander", Offset: 31},
		{Key: "Alison", Offset: 43},
	}

	// `SlimIndex` is simply a container of SlimTrie and its data.
	st, err := index.NewSlimIndex(keyOffsets, data)
	if err != nil {
		fmt.Println(err)
	}

	// Lookup
	v, found := st.Get("Alison")
	fmt.Printf("key: %q\n  found: %t\n  value: %q\n", "Alison", found, v)

	v, found = st.Get("foo")
	fmt.Printf("key: %q\n  found: %t\n  value: %q\n", "foo", found, v)

	// Output:
	// key: "Alison"
	//   found: true
	//   value: "8"
	// key: "foo"
	//   found: false
	//   value: ""
}

2. Sparse index

Create an index item for every 4(or more as you wish) keys.

Let several adjacent keys share one index item reduces a lot memory cost if there are huge amount keys in external data. Such as to index billions of 4KB objects on a 4TB disk(because one disk IO costs 20ms for either reading 4KB or reading 1MB).

Show me the code ......
package index_test

import (
	"fmt"
	"strings"

	"github.com/openacid/slim/index"
)

type RangeData string

func (d RangeData) Read(offset int64, key string) (string, bool) {
	for i := 0; i < 4; i++ {
		if int(offset) >= len(d) {
			break
		}

		kv := strings.Split(string(d)[offset:], ",")[0:2]
		if kv[0] == key {
			return kv[1], true
		}
		offset += int64(len(kv[0]) + len(kv[1]) + 2)

	}
	return "", false
}

func Example_indexRanges() {

	// Index ranges instead of keys:
	// In this example at most 4 keys shares one index item.

	data := RangeData("Aaron,1,Agatha,1,Al,2,Albert,3,Alexander,5,Alison,8")

	// keyOffsets is a prebuilt index that stores range start, range end and its offset.
	keyOffsets := []index.OffsetIndexItem{
		// Aaron  +--> 0
		// Agatha |
		// Al     |
		// Albert |

		// Alexander +--> 31
		// Alison    |

		{Key: "Aaron", Offset: 0},
		{Key: "Agatha", Offset: 0},
		{Key: "Al", Offset: 0},
		{Key: "Albert", Offset: 0},

		{Key: "Alexander", Offset: 31},
		{Key: "Alison", Offset: 31},
	}

	st, err := index.NewSlimIndex(keyOffsets, data)
	if err != nil {
		panic(err)
	}

	v, found := st.RangeGet("Aaron")
	fmt.Printf("key: %q\n  found: %t\n  value: %q\n", "Aaron", found, v)

	v, found = st.RangeGet("Al")
	fmt.Printf("key: %q\n  found: %t\n  value: %q\n", "Al", found, v)

	v, found = st.RangeGet("foo")
	fmt.Printf("key: %q\n  found: %t\n  value: %q\n", "foo", found, v)

	// Output:
	// key: "Aaron"
	//   found: true
	//   value: "1"
	// key: "Al"
	//   found: true
	//   value: "2"
	// key: "foo"
	//   found: false
	//   value: ""
}

3. Range scan

Slim can also be used as a traditional in-memory kv-store: Building a slim with Opt{ Complete: Bool(true) }, it won't strip out any information(e.g., it won't eliminate single-branch labels) and it will functions the same as a btree. This snippet shows how to iterate key values.

Show me the code ......
package trie

import (
	"fmt"

	"github.com/openacid/slim/encode"
)

func ExampleSlimTrie_ScanFrom() {
	var keys = []string{
		"",
		"`",
		"a",
		"ab",
		"abc",
		"abca",
		"abcd",
		"abcd1",
		"abce",
		"be",
		"c",
		"cde0",
		"d",
	}
	values := makeI32s(len(keys))

	codec := encode.I32{}
	st, _ := NewSlimTrie(codec, keys, values, Opt{
		Complete: Bool(true),
	})

	// untilD stops when encountering "d".
	untilD := func(k, v []byte) bool {
		if string(k) == "d" {
			return false
		}

		_, i32 := codec.Decode(v)
		fmt.Println(string(k), i32)
		return true
	}

	fmt.Println("scan (ab, +∞):")
	st.ScanFrom("ab", false, true, untilD)

	fmt.Println()
	fmt.Println("scan [be, +∞):")
	st.ScanFrom("be", true, true, untilD)

	fmt.Println()
	fmt.Println("scan (ab, be):")
	st.ScanFromTo(
		"ab", false,
		"be", false,
		true, untilD)

	// Output:
	//
	// scan (ab, +∞):
	// abc 4
	// abca 5
	// abcd 6
	// abcd1 7
	// abce 8
	// be 9
	// c 10
	// cde0 11
	//
	// scan [be, +∞):
	// be 9
	// c 10
	// cde0 11
	//
	// scan (ab, be):
	// abc 4
	// abca 5
	// abcd 6
	// abcd1 7
	// abce 8
}

Filter mode and KV mode.

Slim can be built into either a filter(like bloom filter but with key order preserved.) or a real kv-store(like btree) There is an option in NewSlimTrie(..., option) to control the building behavior. Ref: Opt

  • To use slim as a kv-store, set the option to Complete then there won't be false positives.

  • To use it as a filter, set InnerPrefix, LeafPrefix to false(Complete implies InnerPrefix==true and LeafPrefix==true). Then slim won't store any single branch label in the trie it builds.

    With InnerPrefix==true, it does not reduce a single label branch that leads to an inner node.

    With LeafPrefix==true, it does not reduce a single label branch that leads to a leaf node.

    E.g.:

    // Complete
    InnerPrefix: true
    LeafPrefix: true
    ^ -a-> 1 -b-> $
     `-c-> 2 -x-> 3 -y-> $
                   `-z-> $
    
    InnerPrefix: true
    LeafPrefix: false
    ^ -a-> $
     `-c-> 2 -x-> 3 -y-> $
                   `-z-> $
    
    InnerPrefix: false
    LeafPrefix: true
    ^ -a-> 1 -b-> $
     `-c-> 3 -y-> $
            `-z-> $
    
    InnerPrefix: false
    LeafPrefix: false
    ^ -a-> $
     `-c-> 3 -y-> $
            `-z-> $
    

The memory consumption in filter mode and kv mode differs significantly. The following chart shows memory consumption by 1 million var-length string, 10 to 20 byte in different mode:

- size gzip-size
sample data size 15.0M 14.0M
Complete:true 14.0M 10.0M
InnerPrefix:ture 1.3M 0.9M
all false 1.3M 0.8M

Try it

Install

go get github.com/openacid/slim/trie

Change-log: Change-log

Versions

A newer version y being compatible with an older version x means y can load data serialized by x. But x should never try to load data serialized by a newer version y.

  • v0.5.* is compatible with 0.2.*, 0.3.*, 0.4.*, 0.5.*.
  • v0.4.* is compatible with 0.2.*, 0.3.*, 0.4.*.
  • v0.3.* is compatible with 0.2.*, 0.3.*.
  • v0.2.* is compatible with 0.2.*.

Who are using slim

baishancloud

Feedback and contributions

Feedback and Contributions are greatly appreciated.

At this stage, the maintainers are most interested in feedback centered on:

  • Do you have a real life scenario that slim supports well, or doesn't support at all?
  • Do any of the APIs fulfill your needs well?

Let us know by filing an issue, describing what you did or wanted to do, what you expected to happen, and what actually happened:

Or other type of issue.

Authors

See also the list of contributors who participated in this project.

License

This project is licensed under the MIT License - see the LICENSE file for details.

slim's People

Contributors

dependabot-preview[bot] avatar dependabot[bot] avatar drmingdrmer avatar lishulong avatar sckelemen avatar snyk-bot avatar templexxx avatar wenbobuaa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

slim's Issues

Range support

Range support

A range SlimTrie is same as a key-value SlimTrie internally.
The only difference is a range SlimTrie uses two keys(starting and ending key) to represent a range.

// Create a range SlimTrie with 3 range:
// ["a", "p"] --> 1
// ["q", "q"] --> 5 // a range with only one item.
// ["r", "z"] --> 3
st, err := NewSlimTrie(xxx, []string{"a", "p", "q", "r" "z"}, []int32{1,1, 5, 3, 3})
  • Creating a SlimTrie with range support is the same as creating a normal SlimTrie, except that two adjacent keys(start, end for a range) have the same value.

  • Add example_range_test.go to illustrate how to create/use a range SlimTrie.

  • Rewrite tests for the new range support. Also the max number of keys in a range SlimTrie can be determined before creating it.

  • Add trie/README.md explaining how range SlimTrie works.

  • trie.Node.Append(): remove argument isStartLeaf and attribute Node.isStartLeaf, which is not required anymore according to latest design.

Push to branch "release-0.3"

XXX_sizecache undefined

Describe the bug

running make test outputted errors below

# github.com/openacid/slim/array_test [github.com/openacid/slim/array.test]
array/int_test.go:196:4: a.XXX_sizecache undefined (type *array.U16 has no field or method XXX_sizecache)
array/int_test.go:235:3: a.XXX_sizecache undefined (type *array.U16 has no field or method XXX_sizecache)
array/int_test.go:428:4: a.XXX_sizecache undefined (type *array.U32 has no field or method XXX_sizecache)
array/int_test.go:467:3: a.XXX_sizecache undefined (type *array.U32 has no field or method XXX_sizecache)
array/int_test.go:660:4: a.XXX_sizecache undefined (type *array.U64 has no field or method XXX_sizecache)
array/int_test.go:699:3: a.XXX_sizecache undefined (type *array.U64 has no field or method XXX_sizecache)
array/marshal_test.go:73:4: a.XXX_sizecache undefined (type *array.U16 has no field or method XXX_sizecache)
array/marshal_test.go:112:3: a.XXX_sizecache undefined (type *array.U16 has no field or method XXX_sizecache)

To Reproduce

  1. go get github.com/openacid/slim
  2. make gen
  3. mak test
  4. See error

Environment (please complete the following information):

  • OS: mac
  • Version 10.14

Language (please complete the following information):

  • Language: Go
  • Version: 1.12

serialize: potential issue of valid read byte count

n += cnt should be placed after error checking?

func readFull(reader io.Reader, buf []byte) (cnt int, err error) {
	n, cnt, toRead := 0, 0, len(buf)

	for n < toRead {
		cnt, err = reader.Read(buf[n:])
		n += cnt

		if err != nil {
			break
		}
	}

	return n, err
}

simplify Search() and neighborBranches()

Describe what to do

Let st.neighborBranches not deal with leaf branch, it should be dealt with outside this function.
This function should only deal with non-leaf branch.
Thus the searching logic would be more clear.

The code shows that: if word == LeafWord then:

  • Return value ltIdx is always -1
  • Return value eqIdx is always eqIdx
  • Return value rtIdx is the first child of current node, or -1(if no child).
  • Return value ltLeaf(indicate if ltIdx is a leaf) is always false

Thus moving logic about leaf out of this function seems better and clear.


func (st *SlimTrie) neighborBranches(idx uint16, word byte) (ltIdx, eqIdx, rtIdx int32, ltLeaf bool) {
	ltIdx, eqIdx, rtIdx = int32(-1), int32(-1), int32(-1)
	ltLeaf = false

	isLeaf := st.Leaves.Has(int32(idx))

	if word == LeafWord {
		if isLeaf {
			eqIdx = int32(idx)
		}
	} else {
		if isLeaf {
			ltIdx = int32(idx)
			ltLeaf = true
		}
	}

	bm, of, found := st.getChild(idx)
	if !found {
		return
	}

	if (bm >> word & 1) == 1 {
		eqIdx = int32(getChildIdx(bm, of, uint16(word)))
	}

	ltStart := word & WordMask
	for i := int8(ltStart) - 1; i >= 0; i-- {
		if (bm >> uint8(i) & 1) == 1 {
			ltIdx = int32(getChildIdx(bm, of, uint16(i)))
			ltLeaf = false
			break
		}
	}

	rtStart := word + 1
	if word == LeafWord {
		rtStart = uint8(0)
	}

	for i := rtStart; i < LeafWord; i++ {
		if (bm >> i & 1) == 1 {
			rtIdx = int32(getChildIdx(bm, of, uint16(i)))
			break
		}
	}

	return
}

array/ try to optimize

  • pre-store popcount for every 8-bit word.

  • use a 256-elts array to store popcount.

  • use static mask.

Add online document

Describe the solution

In slim there are several sub package.
Add multiple links to their url at godoc.com

Weekly Digest (10 February, 2020 - 17 February, 2020)

Here's the Weekly Digest for openacid/slim:


ISSUES

Last week, no issues were created.


PULL REQUESTS

Last week, no pull requests were created, updated or merged.


COMMITS

Last week there were no commits.


CONTRIBUTORS

Last week there were no contributors.


STARGAZERS

Last week there was 1 stargazer.
x0rzkov
You are the star! 🌟


RELEASES

Last week there were no releases.


That's all for last week, please 👀 Watch and Star the repository openacid/slim to receive next weekly updates. 😃

You can also view all Weekly Digests by clicking here.

Your Weekly Digest bot. 📆

Trie Node: add field `squash`

Trie Node: add field squash

Currently the (not slim) Trie node is defined as follow:

type Node struct {
	Children map[int]*Node
	Branches []int
	Step     uint16
	Value    interface{}

	isStartLeaf bool

	NodeCnt int
}

Add a field squash bool, to indicate this trie to squash preceding branches every time after Append a new key.

When creating child node, the child should inherit the squash value from its parent.

Benifit:

  • Thus this simplifies the method Append(): user does not need to pass a squash argument any more.

error struct proposal

定义一个描述错误的数据格式, 用来实现golang在运行时的错误传递, 目标:

  • 设计数据结构的前提假设:

    • 模块的用户, 90% 只判断有没有err, 不判断err是什么.
    • 每个模块定义自己本层的错误, 但提供接口允许用户在需要时拿到最底层的错误(例如errno等).
    • 避免直接向上传递错误(上层模块拿到底层错误用处不大, 例如创建block时只得到一个permission deny的errno, 无法有效的定位到到底是哪个文件/目录出现了问题).
    • error的设计目标是为了简化查错, 提供完整的日志, 方便统计分析, 出错时的整条调用链的检查.
  • 可以用于API层的错误描述, 一般API处理时收到错误并需要将错误返回给客户端, 要求错误有:

    • 具体唯一确定的error code, 便于客户端判断
    • 人类可读的error message.
    • 发生错误的相关的东西是什么.
  • Message通过Code和Resource生成出来. 只提供一个Message接口用来输出message.

  • 希望这个模块可以作为go的errors包的无修改替代品, 和 https://github.com/pkg/errors 的无修改替代品.

  • 提供一个方便的接口让用户直接取得最底层的错误.

  • 记录引发错误的底层错误是什么. 类似在存储服务中, 一个底层的IO错误导致API失败, 如果能在日志中记录引发错误的错误, 可以方便定位问题.

    有一类上层错误由几个下层错误引起, 例如多数派写中, nw = 5,3, 这时写失败3个后端会引起整个API调用失败, 这时需要记录多个下层错误.

  • error 结构里可选的带有stacktrace 信息,方便打印日志(参考了 https://github.com/pkg/errors )

type Error struct {
  Code       string  // error code
  Cause    []error   // 0 or seveal error that cause this error.
  Resource   string  // optional
  *stack             // optional traceback info of this error
}


// 实现系统error interface
func (e *Error) Error() string // 直接返回Code的string

// 兼容 https://github.com/pkg/errors 的接口:
func (e *Error) Cause() string // 返回第一个cause
// 其他接口没差别不列出了


// 扩展的接口
func (e *Error) AllCause() string // 返回所有cause的slice
func (e *Error) RootCause() string // 返回最初的cause
func (e *Error) AllRootCause() string // 返回所有的最底层的cause; cause的树的叶子节点.
func (e *Error) Message() string  // 给人看的, 通过Code和Resource拼装起来.

例子🌰: s2, group not found的一个可能的错误信息

以s2中场景为例, 描述下如何表示一个具体的错误,
假设一个错误是group没读取到.
而引发这个错误的是通过dbproxy读取group信息失败引发的.
而读dbproxy时重试了2次都失败了, 一次是mysql被置位readonly, 一次是socket读取超时:

Code: GroupNotFound
Resource: "group_id://123"
Cause: 
    - Code: DBProxyReadError
      Resource: "dbproxy://127.0.0.1:3303"
      Cause:
        - Code: MysqlReadonly
          Resource: "mysql://192.168.2.2:3306"
        - Code: InvalidResponse
          Resource: "mysql://192.168.9.9:3306"
          Cause:
              - Code: SocketReadTimeout
                Resource: "tcp4://192.168.9.9:3306"

例子🌰: ec, block build error的一个可能的错误信息

Code: BlockBuildError
Resource: "block://aabbccddeeff"
Cause:
    - Code: NeedleWriteError
      Resource: "needle://3cp/foo/bar.jpg"
      Cause:
          - Code: FSIsReadonly
            Resource: "file:///ecdrives/bbb" # schema of local file url in browser
            Cause:
                - <a native fs error> # may be an error with errno.

Standardise compact array API

compacted array provides 3 method: Init, Get and Has;

It'd be better if compacted array adopts existent popular map-like datatype API.

One to the uncomfortable spot is that Get() returns nil when not found.
But golang map returns a second bool value ok to indicate if it is found.

Describe the solution you'd like

Find a popular key-value implementation. Adopt its API.

Testing imports leak into trie package

Looking at the imports of the github.com/openacid/slim/trie package, I see multiple packages that appear to be used for testing. Among them are testing, unsafe, github.com/openacid/slim/trie/benchmark, github.com/google/btree and more. This appears to be because of the file trie/test_utils.go. If this file were called trie/utils_test.go these imports disappear.

To Reproduce

  1. https://godoc.org/github.com/openacid/slim/trie?imports

Expected behavior
I expected a list of imports without testing

Actual behavior
Includes testing and a few others that probably don't belong there.

I think this can be fixed by renaming the file, but I'm not 100% sure, because I don't know the code too well.

把slimtrie 导入进来

几个相关的数据结构:

  • slim.baseTrie 普通的trie
  • slim.SlimTrie 压缩后的16 branch的trie
  • slim.CompactedArray 压缩数组
  • slim.bit bit操作函数包

各位看看下, 名字ok不?

serialize: add doc

Imported serialize is contained in branch "serialize".
And its merge target should be "serialize-base", which contains unmodified imported codes from ec.

serialize: fix bug: direct string type version compare

Version comparison should be implemented according to version semantics, not with plain string comparison.

func makeDataHeader(verStr string, headerSize uint64, dataSize uint64) *DataHeader {
        ...
	if verStr > version.VERSION { // <-- Bug
		panic("forward compatibility is not supported")
	}

A great practical version semantics package is here: https://github.com/Masterminds/semver

semantic version:

https://semver.org/

Given a version number MAJOR.MINOR.PATCH, increment the:
MAJOR version when you make incompatible API changes,
MINOR version when you add functionality in a backwards-compatible manner, and
PATCH version when you make backwards-compatible bug fixes.
Additional labels for pre-release and build metadata are available as extensions to the > MAJOR.MINOR.PATCH format.

marshaling support for SlimTrie

Support proto.Marshal(SlimTrie)

SlimTrie should implement two interface: proto.Marshaler and proto.Unmarhsaler.

type Marshaler interface {
	Marshal() ([]byte, error)
}
type Unmarshaler interface {
	Unmarshal([]byte) error
}

To let the upper level apps easily marshal/unmarshal a SlimTrie, just like operating on a standard proto.Message.

To utilize protobuf functionalities.

API changes:

Currently SlimTrie provides Marshal and Unmarshal but the return value are different: func (st *SlimTrie) marshal(writer io.Writer) (cnt int64, err error)

On-disk data structure changes:

Currently on disk format is:

Children-header
Children-array
Steps-header
Steps-array
Leaves-header
Leaves-array

The header is written by package serialize.

After adapting to proto.Marshaler, the header is not stored any more:

Children-array
Steps-array
Leaves-array

It should be responsibility of upper level to add the header.
E.g. by calling serialize.Marshal(slimtrie) instead of calling slimtrie.Marshal.

serialize.Marshal writes a header(consists of version and length of the marshaled object), then finaly the on-disk layout should be:

SlimTrie-header
Children-array
Steps-array
Leaves-array

add "contribution.md"

To describe:

  • How to contribute code.
  • code style
  • document style
  • how to test
  • purpose of this project, what to do and what not to do.

Weekly Digest (19 April, 2020 - 26 April, 2020)

Here's the Weekly Digest for openacid/slim:


ISSUES

Last week, no issues were created.


PULL REQUESTS

Last week, no pull requests were created, updated or merged.


COMMITS

Last week there were no commits.


CONTRIBUTORS

Last week there were no contributors.


STARGAZERS

Last week there were 4 stagazers.
TheBits
gnomeria
eeelinc
vislee
You all are the stars! 🌟


RELEASES

Last week there were no releases.


That's all for last week, please 👀 Watch and Star the repository openacid/slim to receive next weekly updates. 😃

You can also view all Weekly Digests by clicking here.

Your Weekly Digest bot. 📆

Trie are missing keys

I created a trie with ~70k unique keys and corresponding integer values with trie.NewSlimTrie(encode.Int{}, keys, values)
I then measured how many of my keys were in the trie. I was lacking roughly 14%

To Reproduce
This doesn't return the same number of keys, but shows the problem:

func TestSlimTrie(t *testing.T) {
	var keys []string
	var indexes []int
	for i := 0; i < 70000; i++ {
		keys = append(keys, randStringRunes(10))
		indexes = append(indexes, i)
	}
	sort.Strings(keys)
	tr, err := trie.NewSlimTrie(encode.Int{}, keys, indexes)
	if err != nil {
		t.Error(err)
		return
	}
	count := 0
	for _, key := range keys {
		_, found := tr.Get(key)
		if !found {
			count++
		}
	}
	if count > 0 {
		t.Errorf("Couldn't find %f%% of the keys entered. Missing %d keys", float64(count)/float64(len(keys))*100, count)
	}
}

Environment:

  • MacOS
  • openacid/slim v0.5.5

Language:

  • go 1.12

Weekly Digest (16 February, 2020 - 23 February, 2020)

Here's the Weekly Digest for openacid/slim:


ISSUES

Last week 3 issues were created.
Of these, 1 issues have been closed and 2 issues are still open.

OPEN ISSUES

💚 #116 build(deps): bump github.com/stretchr/testify from 1.3.0 to 1.5.1, by dependabot-preview[bot]
💚 #115 [Snyk] Security upgrade pyyaml from 5.1 to 5.2, by snyk-bot

CLOSED ISSUES

❤️ #114 build(deps): bump github.com/stretchr/testify from 1.3.0 to 1.5.0, by dependabot-preview[bot]

NOISY ISSUE

🔈 #114 build(deps): bump github.com/stretchr/testify from 1.3.0 to 1.5.0, by dependabot-preview[bot]
It received 2 comments.


PULL REQUESTS

Last week, 2 pull requests were created, updated or merged.

UPDATED PULL REQUEST

Last week, 2 pull requests were updated.
💛 #116 build(deps): bump github.com/stretchr/testify from 1.3.0 to 1.5.1, by dependabot-preview[bot]
💛 #115 [Snyk] Security upgrade pyyaml from 5.1 to 5.2, by snyk-bot


COMMITS

Last week there were no commits.


CONTRIBUTORS

Last week there were no contributors.


STARGAZERS

Last week there were 5 stagazers.
Demonstrandum
vrischmann
niciyan
an63
gvelo
You all are the stars! 🌟


RELEASES

Last week there were no releases.


That's all for last week, please 👀 Watch and Star the repository openacid/slim to receive next weekly updates. 😃

You can also view all Weekly Digests by clicking here.

Your Weekly Digest bot. 📆

Find a better test framework

go native if x != y { t.Fatalf... is ugly.
Need a new one with more human readable syntax.

Is the requested enhancement related to a problem? Please describe.

No

Describe the solution

Affect other component or side effect

It's better if a new test framework compatible with go native testing.
Otherwise tests need to be rewritten.

Weekly Digest (12 April, 2020 - 19 April, 2020)

Here's the Weekly Digest for openacid/slim:


ISSUES

Last week 1 issue was created.
It is still open.

OPEN ISSUES

💚 #122 build(deps): bump github.com/golang/protobuf from 1.3.1 to 1.4.0, by dependabot-preview[bot]


PULL REQUESTS

Last week, 1 pull request was created, updated or merged.

UPDATED PULL REQUEST

Last week, 1 pull request was updated.
💛 #122 build(deps): bump github.com/golang/protobuf from 1.3.1 to 1.4.0, by dependabot-preview[bot]


COMMITS

Last week there were no commits.


CONTRIBUTORS

Last week there were no contributors.


STARGAZERS

Last week there was 1 stargazer.
shijiayun
You are the star! 🌟


RELEASES

Last week there were no releases.


That's all for last week, please 👀 Watch and Star the repository openacid/slim to receive next weekly updates. 😃

You can also view all Weekly Digests by clicking here.

Your Weekly Digest bot. 📆

Weekly Digest (5 April, 2020 - 12 April, 2020)

Here's the Weekly Digest for openacid/slim:

ISSUES

This week, no issues have been created or closed.

PULL REQUESTS

This week, no pull requests has been proposed by the users.

CONTRIBUTORS

This week, no user has contributed to this repository.

STARGAZERS

This week, no user has starred this repository.

COMMITS

This week, there have been no commits.

RELEASES

This week, no releases were published.

That's all for this week, please watch 👀 and star ⭐ openacid/slim to receive next weekly updates. 😃

Explain the computation of space complexity

What document to add
裁剪之后的SlimTrie:
每个节点都有大于1个子节点, 除LeafParent节点之外。
SlimTrie的大小跟key的长度无关. 也就是说任意长度的key的集合都可以使用有限大小的SlimTrie来索引: SlimTrie最多有2n-1个节点: 最多节点是出现在所有inner节点都是二分的时候(空间复杂度为O(n))

//why the most node number of SlimTrie is 2n-1. Could you plase add more comments?

Additional context

support multiple versions of impl for one data-type

Add version constrained marshal/unmarshal to array

A data-type such as array would evolve and finally there are several implementation in a package.
A versionizing mechanism should be introduced to:

  • During serialize, dump version.
  • During deserialize, check version and de-serialize data to correct data-type.

TODO

  • Introduce version API to let data-types implement it and provide its version.
type Version string
type VersionGetter interface {
   func GetVer() Version
}
  • Introduce Marshaller interface to let data-types provides versioned marshal API.
    Marshaller.Marshal() should call serialize.xxx to dump version.
    And also Marshaller.Unmarshal() should check version and choose a data type with correct version.
type Marshaller interface {
    Marshal(w io.Writer) (written int64, o interface{}, err error)
}

type UnMarshaller interface {
    Unmarshal(r io.Reader) (read int64, o interface{}, err error)
}

type MarshallerAt interface {
    MarshalAt(w io.Writer, offset int64) (written int64, o interface{}, err error)
}

type UnMarshallerAt interface {
    UnmarshalAt(r io.Reader, offset int64) (read int64, o interface{}, err error)
}
  • Introduce MarshalHelper struct to simplify Marshaling by converting MarshalAt to Marshal(see iohelper).
type MarshalHelper struct {
    Marshaller
    UnMarshaller
}
func (m *MarshalHelper) MarshalAt(offset int64) (written int64, err error)
  • array should then implement 4 marshal interface.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.