Code Monkey home page Code Monkey logo

btree's Introduction

btree's People

Contributors

adamcolton avatar gauntface avatar gconnell avatar joe2far avatar jonboulle avatar julienrbrt avatar keep94 avatar luffbee avatar nolouch avatar tidwall avatar tuananh avatar urandom2 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

btree's Issues

Persistent version of btree

This btree data structure is an ephemeral data structure. Changes to a particular btree instance are destructive. Would be nice if there was a persistent, immutable counter part to this data structure. Adding to or deleting from, the persistent version of btree creates a new btree with the original intact. We could have the ability to batch together mutations to cut down on the creation of intermediate instances.

A persistent version of btree would make transactional processing very easy as one could quickly revert to an older version. A persistent, immutable btree helps with concurrency too. On goroutine could read the btree while another is modifying it. A persistent btree helps with undo / redo operations too.

I figure the persistent version would be almost like the ephemeral version except instead of mutating nodes in place, we do copy-on-write. Doing a single mutation to a persistent btree, one add or one delete, would be cheap. The old and new versions would share many of the same nodes. Only log(n) nodes would be different.

With the persistent version of btree, there would be no free list or recycling of nodes. Each node in the btree would be immutable as it could be shared by many different versions of the same btree.

If this sounds interesting to you, I'd be willing to take on the work.

Generic approach

With support for generics, we can replace interfaces with generic types.
The use of generics avoids variables escaping to heap when calling the B-tree functions yielding 20-30% improvement.

Below are some comparisons between benchmarks (taken from original repository) done with benchstat:

name                                 old time/op    new time/op    delta
Insert-8                                121ns ± 1%      89ns ± 1%   -27.04%  (p=0.008 n=5+5)
Seek-8                                  115ns ± 1%      78ns ± 0%   -31.56%  (p=0.008 n=5+5)
DeleteInsert-8                          248ns ± 0%     185ns ± 1%   -25.51%  (p=0.008 n=5+5)
DeleteInsertCloneEachTime-8            1.10µs ± 5%    0.61µs ± 1%   -44.45%  (p=0.008 n=5+5)
Delete-8                                138ns ± 1%     101ns ± 1%   -26.62%  (p=0.008 n=5+5)
Get-8                                   102ns ± 1%      71ns ± 0%   -30.46%  (p=0.008 n=5+5)
Ascend-8                               40.2µs ± 1%    31.7µs ± 0%   -21.18%  (p=0.008 n=5+5)
Descend-8                              39.3µs ± 1%    30.7µs ± 1%   -21.91%  (p=0.008 n=5+5)
AscendRange-8                          72.3µs ± 1%    57.6µs ± 1%   -20.39%  (p=0.008 n=5+5)
DescendRange-8                         92.9µs ± 1%    77.6µs ± 1%   -16.45%  (p=0.008 n=5+5)

name                                 old alloc/op   new alloc/op   delta
Insert-8                                35.6B ± 4%     18.4B ± 3%   -48.31%  (p=0.008 n=5+5)
Seek-8                                  7.00B ± 0%     0.00B       -100.00%  (p=0.008 n=5+5)
DeleteInsert-8                          0.00B          0.00B           ~     (all equal)
DeleteInsertCloneEachTime-8            2.98kB ± 5%    1.91kB ± 1%   -36.16%  (p=0.008 n=5+5)
Delete-8                                0.00B          0.00B           ~     (all equal)
Get-8                                   0.00B          0.00B           ~     (all equal)
Ascend-8                                0.00B          0.00B           ~     (all equal)
Descend-8                               0.00B          0.00B           ~     (all equal)
AscendRange-8                           0.00B          0.00B           ~     (all equal)
DescendRange-8                          0.00B          0.00B           ~     (all equal)

name                                 old allocs/op  new allocs/op  delta
Insert-8                                 0.00           0.00           ~     (all equal)
Seek-8                                   0.00           0.00           ~     (all equal)
DeleteInsert-8                           0.00           0.00           ~     (all equal)
DeleteInsertCloneEachTime-8              11.0 ± 0%      11.0 ± 0%      ~     (all equal)
Delete-8                                 0.00           0.00           ~     (all equal)
Get-8                                    0.00           0.00           ~     (all equal)
Ascend-8                                 0.00           0.00           ~     (all equal)
Descend-8                                0.00           0.00           ~     (all equal)
AscendRange-8                            0.00           0.00           ~     (all equal)
DescendRange-8                           0.00           0.00           ~     (all equal)

In case of plain Insert Benchmark, the results are even better:

name      old time/op    new time/op    delta
Insert-8     200ns ± 2%     137ns ± 1%   -31.20%  (p=0.008 n=5+5)

name      old alloc/op   new alloc/op   delta
Insert-8     60.0B ± 0%     27.0B ± 0%   -55.00%  (p=0.008 n=5+5)

name      old allocs/op  new allocs/op  delta
Insert-8      1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
func BenchmarkInsert(b *testing.B) {
    b.StopTimer()
    tr := btree.New(32)
    b.StartTimer()
    for i := 0; i < b.N; i++ {
        tr.ReplaceOrInsert(btree.Int(i))
    }
}

Unfortunately the functions that return nil do not work ex. func (t *BTree) Delete(item Item) Item.
We can't simply return zero value for generic type, because it's still a valid value, so I decided to change the API, so that:

func (t *BTree[T]) Delete(item T) (T, bool) { ... }

we also return bool which indicates whether anything was really deleted.
Of course, we can implement B-tree on pointers to generic types, but it harms performance in many cases (like having B-tree for numeric types).

Changes of API destroy backward compatibility, but because of reasons mentioned above, I believe that this approach is justified.

Here is link to forked repository with generic B-tree:
https://github.com/Michal-Leszczynski/btree

How do you suggest we further proceed with this effort.

Worked unexpected in go rountine?

Wrote some test codes but got fatal error

	bt := btree.New(32)
	var wg sync.WaitGroup
	//var lock sync.Mutex
	for i:=1;i<10000;i++ { 
		wg.Add(1)
		go func(n int) {
			//lock.Lock()
			bt.ReplaceOrInsert(btree.Int(n))
			bt.Delete(btree.Int(n))
			//lock.Unlock()
			wg.Done()
		}(i)
	}
	wg.Wait()
	t.Log(bt.Len())
	bt.Clear(true)

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x5264b1]

goroutine 94 [running]:
github.com/google/btree.items.find(0xc000050010, 0x1, 0x1, 0x7a6820, 0xc00008e1e0, 0x0, 0x0)
/mnt/d/develop/go/pkg/mod/github.com/google/[email protected]/btree.go:191 +0xc1

Incorrect deletion result from current B-Tree implementation

Description

Deletion process produces unexpected result based on current implementation.

Steps to reproduce the issue:
1 Prepare the following test case in btree_test.go

package btree

import (
    "flag"
    "fmt"
    "testing"
    "os"
)

var btreeDegree = flag.Int("degree", 3, "B-Tree degree")

func TestForBTreeDeletion(t *testing.T) {
    tree := New(*btreeDegree)
    fmt.Println(tree)

    for i := 0; i < 21; i++ {
        if x := tree.ReplaceOrInsert(Int(i)); x != nil {
            t.Fatal("insert found item", i)
        }
    }
    fmt.Println("Before deletion")
    tree.Print(os.Stdout)

    for i := 0; i < 3; i++ {
        if x := tree.Delete(Int(i)); x == nil {
            t.Fatal("insert didn't find item", i)
        }
        fmt.Println("After deleting ", i)
        tree.Print(os.Stdout)
    }

}

2 Add a tree.Print method for debugging purpose in btree.go

// Print is for debugging purpose on CLI
func (t *BTree) Print(w io.Writer) {
    t.root.print(w, 0)
}

3 Run the test suite

go test -v

Describe the results you received:

Before deletion  *[<===== This is the original tree with degree of 3 after inserting 0 to 20]*
NODE:[8]
  NODE:[2 5]
    NODE:[0 1]
    NODE:[3 4]
    NODE:[6 7]
  NODE:[11 14 17]
    NODE:[9 10]
    NODE:[12 13]
    NODE:[15 16]
    NODE:[18 19 20]

After deleting  0 *[<====== After deleting 0, `NODE[0 1]` underflows and causes redistribution ]*
NODE:[11]
  NODE:[5 8]
    NODE:[1 2 3 4]     **[<====== This NODE has enough items so that deleting 1 shouldn't cause an underflow]**
    NODE:[6 7]
    NODE:[9 10]
  NODE:[14 17]
    NODE:[12 13]
    NODE:[15 16]
    NODE:[18 19 20]

After deleting  1  [<===== WRONG: The underlying B-tree is redistributed]
NODE:[5 8 11 14 17]
  NODE:[2 3 4]
  NODE:[6 7]
  NODE:[9 10]
  NODE:[12 13]
  NODE:[15 16]
  NODE:[18 19 20]

After deleting  2
NODE:[5 8 11 14 17]
  NODE:[3 4]
  NODE:[6 7]
  NODE:[9 10]
  NODE:[12 13]
  NODE:[15 16]
  NODE:[18 19 20]

Describe the results you expected:

See my comment inline in the above section.

To double check the behavior, try to run a visualized b-tree insertion from this website.
1 Selecting max degree as 6
2 Manually insert 0 to 20 (inclusively). [Same underlying tree as above]
3 Manually delete 0 [Correct: Underflow causes redistribution]
4 Delete 1 [Correct: No redistribution. Whereas in the issue reported, the tree is redistributed]

Additional information you deem important (e.g. issue happens only occasionally):

Output of go version:
go version go1.6.2 linux/amd64

Additional environment details (AWS, VirtualBox, physical, etc.):

What is the meaning of `copyOnWriteContext`?

What is the meaning of copyOnWriteContext ?

Only when tree is created, the only one copyOnWriteContext is initiated. There is no other time that create a newcopyOnWriteContext.

In function mutableFor, the n.cow always equals cow.

Question on iteration

I'm using btree to provide an in-memory key-value store for use as a non-persistent substitute for other key-value databases in a cacheing application. The interface that I need to implement using this package requires that I provide an iterator which exposes Next() and Prev() methods.

I have done this using the AscendGreaterOrEqual and DescendLessOrEqual methods as follows: I store a pointer to the current item, and use it as a reference to ascend or descend from, when Next() or Prev() are called, respectively.

While, I believe that internally, iteration within the provided method calls is O(1), in my case, I believe that the lookup of the current item is always O(logN). Since there is not another obvious, O(1) way (to me at least) to restart iteration from a given Item, and move forward or backward one position in the manner I need, and which is common in most key-value store interfaces, I was wondering if the authors thought it might be useful to expose another way to control iteration in a step-by-step manner - O(1) - which keeps track of the current item.

Bounds issue in insertAt

btree/btree.go

Line 204 in be84af9

if index < len(*s) {

I'm curious if this bounds issue is important? I'm using this implementation as a learning opportunity for Go and ran into this while studying the code.

The check whether the index is less than the length seems like it would prevent an unnecessary copy. But instead, this would result in a bounds error when attempting to set the item.

Marshaling btree on disk

Hey there Google Team!
Are there any plans to support the BinaryMarshaler interface? This would be super useful if someone want's to store B-Tree's to disk.

panic: runtime error: index out of range using Ascend iterator

I'm hitting a Index error using Ascend to walk the tree.

panic: runtime error: index out of range

goroutine 20 [running]:
github.com/google/btree.(*node).iterate(0xc420115d00, 0x1, 0x0, 0x0, 0x0, 0x0, 0x440000, 0xc42008ef00, 0x300000002)
        /media/scratch/demo/go/src/github.com/google/btree/btree.go:524 +0x7df
github.com/google/btree.(*node).iterate(0xc4201780c0, 0x1, 0x0, 0x0, 0x0, 0x0, 0xc420030000, 0xc42008ef00, 0xc4200b00d8)
        /media/scratch/demo/go/src/github.com/google/btree/btree.go:512 +0x519
github.com/google/btree.(*node).iterate(0xc4201f4e00, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc42008ef00, 0xc42008ed08)
        /media/scratch/demo/go/src/github.com/google/btree/btree.go:512 +0x519
github.com/google/btree.(*node).iterate(0xc420382900, 0x1, 0x0, 0x0, 0x0, 0x0, 0xc420080000, 0xc42008ef00, 0xc4200100a0)
        /media/scratch/demo/go/src/github.com/google/btree/btree.go:512 +0x519
github.com/google/btree.(*node).iterate(0xc4207c14c0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc42008ef00, 0xc420103200)
        /media/scratch/demo/go/src/github.com/google/btree/btree.go:512 +0x519
github.com/google/btree.(*node).iterate(0xc420d11e40, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc42008ef00, 0x0)
        /media/scratch/demo/go/src/github.com/google/btree/btree.go:512 +0x519
github.com/google/btree.(*node).iterate(0xc423456040, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc42008ef00, 0x0)
        /media/scratch/demo/go/src/github.com/google/btree/btree.go:512 +0x519
github.com/google/btree.(*BTree).Ascend(0xc420132020, 0xc42008ef00)
        /media/scratch/demo/go/src/github.com/google/btree/btree.go:777 +0x5b
main.stage.func1(0xc420018330, 0xc4201280c0, 0x5a6520, 0xc4201180c0, 0xc420128060, 0xc420116100)
        /media/scratch/demo/demo.go:166 +0x28a
created by main.stage
        /media/scratch/demo/demo.go:130 +0xd3

My stage function is putting O(0.5M) object pointers into the BTree:

Sketched implementation:

const (
    BTREE_DEGREE = 7
)

type myTreeItem struct {
    Object *myObject
    Order int
}

func (a myTreeItem) Less(b btree.Item) bool {
    return a.Order < b.Order
}

processChannel := make(chan *myObject, 1000)
bt := btree.New(BTREE_DEGREE)

// Insert Objects
order := 0
for obj := range objects {
    ci := &myTreeItem{
        Object: obj,
        Order: order,
    }
    i := bt.ReplaceOrInsert(ci)
    order += 1
}

// Traverse
bt.Ascend(func (i btree.Item) bool {
    processChannel <- i.(*myTreeItem).Object
    return true
})

Any ideas?

Q: how do I delete a range?

Suppose I have a simple set of intervals whose end time (endx) is stored in a btree. Lets say the interval represents the time a user's compute job completed, so I store the userid and a jobid with it too. I'll use the jobid (assume they are all unique) to break ties so that all records stick around.

Then I want to delete all those records with time <= some end time. Here's what I try:

package main

import (
	"fmt"

	"github.com/google/btree"
)

type job struct {
	endx  int64
	user  string
	jobid string
}

func (a *job) Less(than btree.Item) bool {
	b := than.(*job)
	if a.endx == b.endx {
		// simply distinguish the order,
		// so we get to keep both in the tree.
		return a.jobid < b.jobid
	}
	return a.endx < b.endx
}

func main() {
	tree := btree.New(2)

	tree.ReplaceOrInsert(&job{endx: 1, jobid: "hij"})
	tree.ReplaceOrInsert(&job{endx: 1, jobid: "abc"})
	tree.ReplaceOrInsert(&job{endx: 1, jobid: "def"})

	tree.ReplaceOrInsert(&job{endx: 2, jobid: "hij"})
	tree.ReplaceOrInsert(&job{endx: 2, jobid: "abc"})
	tree.ReplaceOrInsert(&job{endx: 2, jobid: "def"})

	tree.ReplaceOrInsert(&job{endx: 3, jobid: "hij"})
	tree.ReplaceOrInsert(&job{endx: 3, jobid: "abc"})
	tree.ReplaceOrInsert(&job{endx: 3, jobid: "def"})

	fmt.Printf("len = %v\n", tree.Len())

	// delete through the 2s
	tree.AscendGreaterOrEqual(&job{endx: 0}, func(item btree.Item) bool {
		j := item.(*job)
		if j.endx <= 2 {
			fmt.Printf("delete pass deletes %#v\n", item)
			tree.Delete(j)
			return true
		}
		fmt.Printf("delete pass ignores %#v\n", item)
		return false
	})

	fmt.Printf("\n\n tree after deleting everything with "+
		"endx <=2, is size %v; should be size 3\n",
		tree.Len())
	tree.AscendGreaterOrEqual(&job{endx: 0}, func(item btree.Item) bool {

		fmt.Printf("%#v\n", item)
		return true
	})

}

here's what I get. This strange result presents in the upstream petar/gollrb as well.

                                                                                                  
go run ex1.go                                                                                      
len = 9                                                                                            
delete pass deletes &main.job{endx:1, user:"", jobid:"abc"}                                        
delete pass deletes &main.job{endx:1, user:"", jobid:"hij"}                                        
delete pass deletes &main.job{endx:2, user:"", jobid:"abc"}                                        
delete pass ignores &main.job{endx:3, user:"", jobid:"abc"}                                        
                                                                                                   
                                                                                                   
 tree after deleting everything with endx <=2, is size 6; should be size 3                         
&main.job{endx:1, user:"", jobid:"def"}                                                            
&main.job{endx:2, user:"", jobid:"def"}                                                            
&main.job{endx:2, user:"", jobid:"hij"}                                                            
&main.job{endx:3, user:"", jobid:"abc"}                                                            
&main.job{endx:3, user:"", jobid:"def"}                                                            
&main.job{endx:3, user:"", jobid:"hij"}                                                            

BTree Iterators

This isn't an issue per se, but more of a reaching out to the btree maintainer(s) to talk about a another module that might work well with btree since PR #43 is switching btree to use generics.

I have this module at https://github.com/keep94/consume2 documentation at https://pkg.go.dev/github.com/keep94/consume2 that I use primarily to read from databases. The main actor in this module is the Consumer interface that serves a similar purpose as ItemIterator in the btree module. The Consumer interface is used to collect values out of a database query just like the ItemIterator is used to collect values out of a btree. With this consume2 module, one can build more complicated Consumers out of simpler ones ultimately constructing complex pipelines to gather various results while executing the database query just once.

I would like users of the btree module to have the option of leveraging the consume2 module when iterating over a btree.

The following code can be used to convert a Consumer instance into an ItemIterator

func AsItemIterator[T Item[T]](consumer Consumer[T]) ItemIterator[T] {
    return func(value T) bool {
        consumer.Consume(value)
        return consumer.CanConsume()
    }
}

As you can see the Consumer interface has a CanConsume() method and a Consume() method whereas the ItemIterator is a simple function that returns false when consumption is done.

I thought about changing my Consumer interface to have just one method Consume() that returns a bool, like ItemIterator, but I am not sure if that would be better or worse than what I have now. The advantage of having a CanConsume() method is you can test to see if a Consumer can Consume a value without needing to consume a value. That may offer some advantages over just having one Consume() method that returns a bool indicating whether it can accept more values. The disadvantage of having a CanConsume() method is that the Consume() method has to repeat the logic of CanConsume() in case the caller decides to call Consume() without calling CanConsume() first.

So I have three choices:

  1. Leave the Consumer interface as it is, and add a method called AsItemIterator() to consume2 that converts a Consumer instance to an ItemIterator.
  2. Change the Consumer interface so that it has one method Consume() that returns a bool similar to how ItemIterator works.
  3. Replace the Consumer interface with a function type that accepts a T value and returns a bool.

I think option 3 would be kind of confusing since I also have functions that accept a T value and return a bool that act as filters. I am leaning toward option 1 or option 2.

What do you think would be best for the community?

Thank you for your feedback.

should call Less() fewer times when iterating

when calling DescendRange methods, an iteration is performed.

All items between [lessOrEqual, greaterThan ) are iterated.

Suppose there're n items in the range, I though Less function is called log(n) times.

However, Less is called more than n times as a matter of

if stop != nil && !n.items[i].Less(stop) {
				return hit, false
			}

Why make comparisons this way? I though if parent node can decide all its children are in the range, most Less(stop) call can be optimized out.

Are there anything wrong with this idea?

Read traversal and COW

My case: I have always two clones, one for updating and the other for read traversal, every time a "transaction" is done I clone again the tree to force the next update to copy the nodes.

The problem with cow: even for this case it may occur that the reader traverses a subtree that's sill being modified (i.e. and incomplete). The cause is that mutableFor() changes t.root immediately and the changes are applied to the new root. The same happens for children derived from mutableChildren() that changes the also n.children[i].

The solution is to delay the change to the original pointer (root or children) until all changes in the subtree are finished.

I prepared a first patch wit the idea, only for t.root: master...gallir:cow

If it's correct and you agree I can prepare a patch for children and items if needed.

Tagged releases

Hi,

As Go moves towards semantic versioning and vendoring, it would be really helpful to have a tagged release of btree following semantic versioning. For instance, you could run:

git checkout master; git tag v1.0.0; git push v1.0.0

Which would define the current version to be v1.0.0.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.