The atlas from mstoykov

Optimize by having pre-allocated arrays of `Node` values?

This definitely needs benchmarking, but I've been thinking that atlas will allocate quite a lot of Node values on the heap, even with optimal usage, and they persist basically forever. So, the Go garbage collector will have its job cut out for it, walking through all of these references between the Node values, only to end up collecting basically none of them.

What if there was this buffer, say a slice of fixed-length arrays like [][1000]Node. And we have a couple of atomics and some light locking (around creating a new array buffer block) and instead of creating a new Node and returning its pointer here:

atlas/atlas.go

Lines 135 to 139 in 388f114

    
           newNode = &Node{ 
        
           	root:    n.root, 
        
           	prev:    n, 
        
           	linkKey: [2]string{key, value}, 
        
           }

We just get the ID of the next free Node in the buffer, fill in the details and return a pointer to the specific array element? I am making some assumptions on how the Go GC works, so this idea might be complete nonsense... 😓 If someone knows better, please share. Otherwise, I think it's worth benchmarking.

Drop the `root` reference from every `Node`?

I am not sure if this brings in any value:

atlas/atlas.go

Line 19 in 23cc720

root *Node // immutable

The IsRoot() check:

atlas/atlas.go

Lines 33 to 36 in 23cc720

    
           // IsRoot checks if the current Node is the root. 
        
           func (n *Node) IsRoot() bool { 
        
           	return n.root == n 
        
           }

Can also be written like this, right:

func (n *Node) IsRoot() bool {
	return n.prev == n
}

Maybe we can add a level int property in its place instead? That can tells us how deep the current Node is in the graph, i.e. how many key=value tags it contains. Which will be useful for informational purposes, or if we want to pre-allocate a map or a slice container to put the in. Or, if we want to iterate over the list more easily, though that also be done with for !n.IsRoot() { } 🤔

Alternatives evaluation

Apparently, the second version [TODO: link here when merged] is faster than if n.linkKey == sub.linkKey. Discover why and evaluate the difference with LinkKey{Key: "abc", Value: "xyz"} or other alternative solutions.

if n.linkKey == sub.linkKey

goos: linux
goarch: amd64
pkg: github.com/mstoykov/atlas
cpu: AMD Ryzen 7 4800H with Radeon Graphics
BenchmarkContains/1000-16         	203409088	         5.927 ns/op	       0 B/op	       0 allocs/op
BenchmarkContains/10000-16        	198273142	         6.027 ns/op	       0 B/op	       0 allocs/op
BenchmarkContains/100000-16       	193312813	         6.105 ns/op	       0 B/op	       0 allocs/op
PASS
ok  	github.com/mstoykov/atlas	7.983s

if n.linkKey[0] == sub.linkKey[0] && n.linkKey[1] == sub.linkKey[1]

goos: linux
goarch: amd64
pkg: github.com/mstoykov/atlas
cpu: AMD Ryzen 7 4800H with Radeon Graphics
BenchmarkContains/1000-16         	345073720	         3.344 ns/op	       0 B/op	       0 allocs/op
BenchmarkContains/10000-16        	348490562	         3.390 ns/op	       0 B/op	       0 allocs/op
BenchmarkContains/100000-16       	338773561	         3.402 ns/op	       0 B/op	       0 allocs/op
PASS
ok  	github.com/mstoykov/atlas	7.036s

Think about per node "storage"

A thing that came up is using *Node as a key in a map to cache something. But if that somethign will be needed in multiple places in might be easier to have it in the *Node.

Having multiple key/values will be harder but maybe just a simple Sync.Map will be enough 🤔

This might not be so great in general as the current use cases is caching the marshalling of the node in different formats - JSON, CSV, w/e else we need. But adding this will mean that at least one more sync.Map will need to always be allocated.

Optimize lookups by considering key sorting

Node.ValueByKey() and Node.Contains() can probably be improved by taking into account that the keys are sorted. So we don't always need to reach the root - if the key of the current Node is "lower' than the one we are comparing it against, we don't need to dig any deeper, right?

mstoykov / atlas Goto Github PK

atlas's People

Contributors

Stargazers

Watchers

Forkers

atlas's Issues

Optimize by having pre-allocated arrays of `Node` values?

Drop the `root` reference from every `Node`?

Alternatives evaluation

Think about per node "storage"

Optimize lookups by considering key sorting

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	newNode = &Node{
	root: n.root,
	prev: n,
	linkKey: [2]string{key, value},
	}

	// IsRoot checks if the current Node is the root.
	func (n *Node) IsRoot() bool {
	return n.root == n
	}