emirpasic / gods Goto Github PK
View Code? Open in Web Editor NEWGoDS (Go Data Structures) - Sets, Lists, Stacks, Maps, Trees, Queues, and much more
License: Other
GoDS (Go Data Structures) - Sets, Lists, Stacks, Maps, Trees, Queues, and much more
License: Other
Pointed out by @wildfire810
Would be natural for some data structures, e.g. doubly linked list. For other data structures, e.g. singly linked list, it wouldn't make much sense.
Hi. I've a few questions about tests in binaryheap_test.go by example.
Thank you.
A lot has changed in the last years. Go's sort and overall language implementation has improved. So I've decided to rerun a few tests and do research in order to decide if the included Timsort's implementation is worth the maintenance and additional memory allocation in GoDS:
github.com/psilva261/timsort/# go test -test.bench=.*
(Go 1.6, Intel(R) Core(TM) i5-4690K CPU @ 3.50GHz)
Speed, as per given datasets, is still in favor of Timsort, albeit these xor-datasets are biased towards Timsort's adaptivness from minrun, exactly the purpose for which Timsort was crafted with assumption that real-world datasets show similar patterns. Xor-dataset generation : 0xff & (i ^ 0xab)
, starting xor-dataset sequence:
171, 170, 169, 168, 175, 174, 173, 172, 163, 162, 161, 160, 167, 166, 165, 164, 187, 186, 185, 184, 191, 190, 189, 188, 179, 178, 177, 176, 183, 182, 181, 180, 139, 138, 137, 136, 143, 142, 141, 140, 131, 130, 129, 128, 135, 134, 133, 132, 155, 154, 153, 152, 159, 158, 157, 156, 147, 146, 145, 144, 151, 150, 149, 148, 235, 234, 233, 232, 239, 238, 237, 236, 227, 226, 225, 224, 231, 230, 229, 228, 251, 250, 249, 248, 255, 254, 253, 252, 243, 242, 241, 240, 247, 246, 245, 244, 203, 202, 201, 200, 207, 206, 205, 204, 195, 194, 193, 192, 199, 198, 197, 196, 219, 218, 217, 216, 223, 222, 221, 220, 211, 210, 209, 208, 215, 214, 213, 212, 43, 42, 41, 40, 47, 46, 45, 44, 35, 34, 33, 32, 39, 38, 37, 36, 59, 58, 57, 56, 63, 62, 61, 60, 51, 50, 49, 48, 55, 54, 53, 52, 11, 10, 9, 8, 15, 14, 13, 12, 3, 2, 1, 0, 7, 6, 5, 4, 27, 26, 25, 24, 31, 30, 29, 28, 19, 18, 17, 16, 23, 22, 21, 20, 107, 106, 105, 104, 111, 110, 109, 108, 99, 98, 97, 96, 103, 102, 101, 100, 123, 122, 121, 120, 127, 126, 125, 124, 115, 114, 113, 112, 119, 118, 117, 116, 75, 74, 73, 72, 79, 78, 77, 76, 67, 66, 65, 64, 71, 70, 69, 68, 91, 90, 89, 88, 95, 94, 93, 92, 83, 82, 81, 80, 87, 86, 85, 84, 171, 170, 169, 168, 175, 174, 173, 172, 163, 162, 161, 160, 167, 166, 165, 164, 187, 186, 185, 184, 191, 190, 189, 188, 179, 178, 177,176, 183, 182, 181, 180, 139, 138, 137, 136, 143, 142, 141, 140, 131, 130, 129, 128, 135, 134, 133, 132, 155, 154, 153, ...
However, filtering out only random datasets, Go's sort expectedly has better performance, because Timsort's attempt to adapt to a random dataset only harms it. Go's simple two phase approach (quicksort first for large, then insertion sort for small datasets) performs better.
Modifying the test parameters to show memory allocation : go test -bench . -benchmem -benchtime 1s
(allocated & allocation):
BenchmarkTimsortXor100-4 1120 B/op 4 allocs/op
BenchmarkTimsortInterXor100-4 1568 B/op 6 allocs/op
BenchmarkStandardSortXor100-4 0 B/op 0 allocs/op
BenchmarkTimsortSorted100-4 1120 B/op 4 allocs/op
BenchmarkTimsortInterSorted100-4 1568 B/op 6 allocs/op
BenchmarkStandardSortSorted100-4 0 B/op 0 allocs/op
BenchmarkTimsortRevSorted100-4 1120 B/op 4 allocs/op
BenchmarkTimsortInterRevSorted100-4 1568 B/op 6 allocs/op
BenchmarkStandardSortRevSorted100-4 0 B/op 0 allocs/op
BenchmarkTimsortRandom100-4 1120 B/op 4 allocs/op
BenchmarkTimsortInterRandom100-4 1568 B/op 6 allocs/op
BenchmarkStandardSortRandom100-4 0 B/op 0 allocs/op
BenchmarkTimsortXor1K-4 12576 B/op 5 allocs/op
BenchmarkTimsortInterXor1K-4 14656 B/op 7 allocs/op
BenchmarkStandardSortXor1K-4 0 B/op 0 allocs/op
BenchmarkTimsortSorted1K-4 4384 B/op 4 allocs/op
BenchmarkTimsortInterSorted1K-4 10560 B/op 6 allocs/op
BenchmarkStandardSortSorted1K-4 0 B/op 0 allocs/op
...
The dataset item in question (int is 64-bit on my machine):
type record struct {
key, order int
}
Go's sort as per documentation in source code does not make any memory allocations. This favors Go's sort.
Conclusion:
Even after this "quick" analysis, I am not sure if a switch to Go's sort is a good or bad idea or something in-between (most-probably). There is nothing like an average dataset and the choice depends on the nature of the dataset. When using GoDS in writing and solving complex systems, sorting is least likely to be the bottleneck, so analyzing this further is irrelevant and topic to academic discussions.
Timsort might be replaced in GoDS by Go's native sorting simply for keeping the code base of this library as small as possible and arguments given above.
References:
HashMap serialization is not hierarchically serialized for the multiplicity of value values,
for example:
[
"a": [
"b": [:]
],
"v": ["e": 23]
]
At the same time, deserialization also has the same problem
What do you think about Enumerator?
In offical library, there is "container/list", it contains Element, that can take values one by one.
I think it takes less complex then every time find it.
I am using code that collects KV pairs in a map[string]interface{}. Because it's Go, when json.Unmarshal() iterates over the keys, they come out in a random order. This does not work for my purposes; I need the keys to iterate in the same order in which they were inserted.
Your TreeMap maintains an ordered set of keys, based on a comparator. Does a comparator exist, or can one be built, that sorts the keys based on the order of their insertion? And if so, would the JSON interface then serialize these keys in the insertion order?
Thank you.
I think it would be useful to have skip lists included in the GoDS collection.
I previously studied the publicly available Go skip list implementations and found some issues with each. I then created a very fast, threadsafe skip list of my own (under MIT license). I believe it is one of the best foundations to work from for this data structure and can easily be integrated into GoDS.
If you like, I can work on a pull request to include my skip list implementation with the appropriate interface. First, I wanted to make sure this would be useful and to ask what specific details I should watch for / include to ensure smooth compatibility.
It'd be nice to have a bulk initialization convenience method for the container interface. Example use case is creating a hash or set of a heavily used static data set, like country information.
For algorithms I expect test coverage to be about 100%. However for example for red black tree current test coverage is 84.2%. Looking at more detailed output you can see some algorithm branches are untested. This is a huge factor when you choose a library on GitHub.
go test -coverprofile=coverage.out
go tool cover -html=coverage.out
I'm trying to use treebidimap and detect this strange behavior:
t := treebidimap.NewWith(utils.IntComparator, utils.StringComparator)
t.Put(1,1)
t.Put(1,1)
Tried to check [redblacktree.go] and got the following :
// Put inserts node into the tree.
// Key should adhere to the comparator's type assertion, otherwise method panics.
func (tree *Tree) Put(key interface{}, value interface{}) {
var insertedNode *Node
if tree.Root == nil {
tree.Root = &Node{Key: key, Value: value, color: red}
insertedNode = tree.Root
} else {
node := tree.Root
loop := true
for loop {
compare := tree.Comparator(key, node.Key)
There seems the Comparator is not used for first node/element, it is inserted directly to the map.
Is this case as your expected? Did I miss any information in README?
I have a small question concerning treesets. I'm applying it on bigger string-sets. Kind of like ORDER BY
in SQL languages for my result-sets. I ran some Benchmarks with 5000 .Add(SomeStruct{123, "Name"})
per run, but I was wondering; how does the object do this logic in the background? Does every .Add()
constantly trigger the sorting to happen or only if I read from the object? The performance degrades very fast with strings and I already have an older mechanic running in my API to do this kind of like this:
nameAsc := func(p1, p2 *SorterObject) bool {
return p1.Value < p2.Value
}
nameDesc := func(p1, p2 *SorterObject) bool {
return p1.Value > p2.Value
}
By(nameAsc).Sort(users)
This code-structure also suffers from exponential degradation.
I kind of wanted to see if the most optimal form of using treesets would be faster than the native Go solution I currently have running.
Any other tips (maybe another object would be better at this even) would be very much appreciated too! The idea of what I'm trying to achieve is that the Key keeps actively moving with the way the Name (string-value) is sorted.
Benchmarks for 10k and 100k .Add()
per cycle:
1000 1920877 ns/op
100 19820013 ns/op
How to find keys with given prefix in a high efficiency way ?
Enumerable functions like "select" "find" has always to traverse all the treemap ?
hi
can multi children as type map?
example :
map["name":"test",5:456123123,[1:5,2:7,3:["name":"ttt","age":48]]]
I have a large dataset of 32byte hashes that I need to search. Loading and searching worked but when I try to serialize the tree to JSON it can't be reloaded because the keys get turned into "[1 2 3 4 5]"
, I guess it uses fmt.Sprintf("%v", s)
. The right thing to do would be to marshal to base64.
This is bad. Why is there a custom json marshaller and also: why not just implement https://godoc.org/encoding#BinaryMarshaler? Then you get all kinds of marshals for free as well.
These are often used in algorithms and remove the need for two lookups (one for Left()/Right() and one for Remove()). This also makes it cleaner to use a tree as a priority queue.
Hi emirpasic:
Thank you for your work. and i really like the data struct of gods.
is it possible to add a segment tree or interval tree in gods?
Hi.
I have a question about arraylist. Here.
In go, the slice is already a dynamic array. Is it more appropriate to change it to an array?
var list *arraylist.List
list = {0,1,2,3,4,5}
it := list.Iterator()
it.SetIndex(3);
it.Next(); // 4
I could really use the function Copy on ArrayList, but it would probably also make sense to add it to the other lists.
I can see that the function Values (on ArrayList) returns an copy of the internal array, but I would like to keep interface ArrayList
I think sometimes we use treemap to search continusly data, e.g.:
"search 20 items large than 50",
I think some code like below can solve it:
// Get and return iterator
func (m *Map) GetIterator(key interface{}) (iter Iterator, found bool) {
_iter, err := m.tree.GetIterator(key)
return Iterator{_iter}, err
}
// Get and return iterator
func (tree *Tree) GetIterator(key interface{}) (iter Iterator, found bool) {
node := tree.lookup(key)
if node != nil {
return Iterator{
tree,
node,
between,
}, true
}
return
}
HI, how can i loop a HashSet, i see only TreeSet has a iterator
thx!
Perhaps making New() functions of containers variadic to allow for optimized bulk insertions during initialization.
How to reproduce:
import "github.com/emirpasic/gods/sets/treeset"
var tree *treeset.Set
tree = treeset.NewWith(someComparator)
for _, v := range someItems {
tree.Add(v)
}
items := tree.Iterator()
for items.Next() {
// infinite loop!
}
Must change package from examples to main.
Would be nice to have a circular buffer where adding elements to the front and back take O(1). I see that there are no Queues or Deques and the circular buffer is perfect for both data structures. There can also be linked list queues or deques. Also an array based binary Heap? They are pretty easy to implement. These are just some ideas though.
In order to construct a "doubly" linked list while inserting, new element and old element's prev pointer have to get updated.
Solution:
In doublylinkedlist.go,
newElement.prev = beforeElement
oldNextElement.prev = beforeElement
newElement.prev = beforeElement
oldNextElement.prev = beforeElement
Hi,
I use pairs of ints which are id's for my data. These are 'maps' - relations between two data. What data structure do you suggest I use? Treemap.Map would seem slow for lookup because I can only Get by Key and I'd like to Get by Key or Value.
If I use 2 lists or 2 sets the int pairs will not align.
Should I just use a 2 dimensional int slice? [][]int{}
Thanks.
A pleasure that LinkedHashMap has been added.
While I have a different point about the iterative sequence. The sequence should order by the key last time put in instead of the first time put in.
This situation is caused by the func named LinkedHashMap.Put:
// Put inserts key-value pair into the map.
// Key should adhere to the comparator's type assertion, otherwise method panics.
func (m *Map) Put(key interface{}, value interface{}) {
if _, contains := m.table[key]; !contains {
m.ordering.Append(key)
}
m.table[key] = value
}
Test case as follows:
func main() {
lhm := linkedhashmap.New()
lhm.Put("a", 1)
lhm.Put("b", 2)
lhm.Put("c", 3)
lhm.Put("a", 4)
lhm.Each(func(k, v interface{}) { fmt.Println(k, v) })
}
I except the output like
b 2
c 3
a 4
, but the result is
a 4
b 2
c 3
.
I want to implement a structure base on TreeMap
where I have IP addresses as the key and a TreeSet
as the value. I want to be able to sort the TreeMap
using the size()
of the TreeSet
. This is currently impossible because the comparator can only access either the keys or values.
For example:
m := treemap.NewWith(myComparator)
s1 := treeset.NewWithIntComparator()
s1.Add(2, 2, 3, 4, 5)
s2 := treeset.NewWithIntComparator()
s2.Add(7, 8, 9)
m.Put("server1", s1)
m.Put("server2", s2)
Unfortunately, the myComparator
can only access the keys:
func myComparator(a, b interface{}) int {
// a and b are the keys
}
What do you think about adding access to the values for key comparators used by maps?
type HashSet struct {
set *hashset.Set
}
func NewHashSet() *HashSet {
var result HashSet
result.set = hashset.New()
return &result
}
// If call Add("abc"), it will crash and output "unhashable type []interface", why?
func (s *HashSet) Add(items ...interface{}) {
s.set.Add(items)
}
Thank you.
Dose your iterator can reset?
I mean reset from 0 position.
And start another 1\2\3\4...
type User struct {
Priority int
Args []interface{}
Method func(args ...interface{}) bool
}
// Custom comparator (sort by IDs)
func byID(a, b interface{}) int {
// Type assertion, program will panic if this is not respected
c1 := a.(User)
c2 := b.(User)
switch {
case c1.Priority > c2.Priority:
return 1
case c1.Priority < c2.Priority:
return -1
default:
return 0
}
}
func TestMain(t *testing.T) {
set := binaryheap.NewWith(byID)
set.Push(User{1, []interface{}{1.0, 2.0}, Equal})
set.Push(User{3, []interface{}{1.0, 2.0}, Equal})
set.Push(User{2, []interface{}{1.0, 2.0}, Equal})
set.Push(User{2, []interface{}{3.0, 2.0}, Equal})
set.Push(User{2, []interface{}{3.0, 2.0}, Equal})
t.Error(set.Values()) // [{1 [1 2] 0x51fd40} {2 [3 2] 0x51fd40} {2 [1 2] 0x51fd40} {3 [1 2] 0x51fd40} {2 [3 2] 0x51fd40}]
}
Priority list = 1 2 2 3 2 // ? 1 2 2 2 3?
Copying a treeset.Iterator
will result in panics when treeset.Iterator.Value()
is called. I suspect the same holds for all Iterate()
methods.
Because Iterator
is stateful, I suggest returning an *Iterator
from all Iterate()
methods.
Why don't you design Insert operation in List?
I think most of people need it
An absolute must-have.
Basic b-tree in-memory implementation.
B+ tree will be skipped, since there isn't much difference except that it holds only the indexes in the tree, but the elements in an external structure. B* tree could be a future extension to this, if packing becomes important.
In future we should see how to serialize/deserialize B tree, since the use case of B tree is generally IO related: database indexing, files, etc. Having a method to load/save to disk or simply serialize/deserialize to bytes would make this useful, i.e. Serializable interface
It would be a convenient feature for the AVL tree (or possibly the other tree implementations too) if we could have a method on each node that returns the size of the subtree rooted at that node.
I would love to see standard MarshalJSON / UnmarshalJSON (and their streaming Encode/Decode counterparts) support.
This would allow us to persist complex structs using data structures without having to manually implement JSON reading/writing
In many (most?) List implementations, a method to set a given value is often exposed.
For example, with Go slices we can do:
x := make([]int, 4)
x[2] = 7 // This method is missing for ArrayList et. al.
Thoughts on this? I would be willing to do a PR.
E.g.
heap := binaryheap.NewWithIntComparator()
heap.Push(5)
heap.Push(2)
heap.Push(3)
heap.Push(1)
heap.Push(4)
fmt.Println(heap.Values())
Results in [1 2 3 5 4]
Hi guys, I'm wondering whether I should use RWMutex with the structures? I'm working on a concurrent application.
Thanks very much!
Functions like putAll(Map) in java or insertion_hint concept in c++ gives possibility to add some range of sorted values to map in more efficient way. Maybe I didn't notice something, but currently similar optimization cannot be achieved with existing interface.
Any follow up plan to add "seek" function for treemap ?
like:
it := m.Iterator()
prefix :="1234"
it.Seek(prefix)
for it.Next() {
...
}
Not sure if this is possible, but it would be nice if there were a way to implement a default json marshaling implementation so that the collection data can be exported as json.
One of the hardest data structures to implement, especially due to the splitting function, but is frequently used throughout GIS.
Vote if you'd like to see this in GoDS.
Notes:
The following example doesn't seem to work:
tm1 := treemap.NewWithIntComparator()
tm1.Put(1, "x")
tm1.Put(2, "b")
tm1.Put(1, "a")
data, _ := tm1.ToJSON()
tm2 := treemap.NewWithIntComparator()
tm2.FromJSON(data) // fails
The serialized JSON has the integer keys in quotes - {"1":"a","2":"b"}
which is, of course, the only possibility conforming to JSON spec. However, this is counter-intuitive. Would it not be better to store an array of key/value pairs?
I noticed you have Map
and Select
method commented out in EnumerableWithIndex
and EnumerableWithKey
interfaces since returning a container will require type assertion at every step, pretty ugly for chaining.
I would suggest returning EnumerableWithIndex
(or EnumerableWithKey
) for those methods and add a ToContainer
method at the end.
This would allow API calls like:
f1 := func(index int, value interface{}) interface{} { // some mapping }
f2 := func(index int, value interface{}) interface{} { // some other mapping }
f3 := func(index int, value interface{}) bool { // some filtering }
list.Map(f1).Map(f2).Select(f3).toContainer()
At the end, the user is free to do any type assertion as they want.
https://flaviocopes.com/golang-generic-generate/
it is an attractive idea
With go's map
type, its safe to delete an element while iterating; does this same pattern hold for gods
?
Could you please add 2-4 tree
and then queap.
Happy to contribute as my best, so far my obstacle is the
fact that I was not able to file a complete implementation of any
of the two in any language or pseudolanguage
Tk in advance,
LP
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.