Code Monkey home page Code Monkey logo

swiss's Introduction

SwissMap

SwissMap is a hash table adapated from the "SwissTable" family of hash tables from Abseil. It uses AES instructions for fast-hashing and performs key lookups in parallel using SSE instructions. Because of these optimizations, SwissMap is faster and more memory efficient than Golang's built-in map. If you'd like to learn more about its design and implementation, check out this blog post announcing its release.

Example

SwissMap exposes the same interface as the built-in map. Give it a try using this Go playground.

package main

import "github.com/dolthub/swiss"

func main() {
	m := swiss.NewMap[string, int](42)

	m.Put("foo", 1)
	m.Put("bar", 2)

	m.Iter(func(k string, v int) (stop bool) {
		println("iter", k, v)
		return false // continue
	})

	if x, ok := m.Get("foo"); ok {
		println(x)
	}
	if m.Has("bar") {
		x, _ := m.Get("bar")
		println(x)
	}

	m.Put("foo", -1)
	m.Delete("bar")

	if x, ok := m.Get("foo"); ok {
		println(x)
	}
	if m.Has("bar") {
		x, _ := m.Get("bar")
		println(x)
	}

	m.Clear()

	// Output:
	// iter foo 1
	// iter bar 2
	// 1
	// 2
	// -1
}

swiss's People

Contributors

andy-wm-arthur avatar elliotwutingfeng avatar reltuk avatar rfyiamcool avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

swiss's Issues

Clear() only clears crtl and doesn't clear the groups

Clear() is not working as intended. g is a copy of the group, not the group itself. The tests pass because ctrl is cleared, which tricks the test into thinking that clear has worked because the keys can no longer be found. I also noticed that Clear() does not use newEmptyMetadata() so there is an optimisation to cache an empty meta and then just assign it control. The same optimisation could be applied to the groups by creating a zero group and the applying it to each group.

Incorrect result?

Hi, I was trying to reproduce map benchmark using swiss
https://github.com/kokizzu/kokizzu-benchmark/blob/master/assoc/go-swiss/swiss.go
https://github.com/kokizzu/kokizzu-benchmark/blob/master/assoc/map.go

https://pastebin.com/diff/SBZXSPFb
but I got different result when using swiss

time go run map.go
6009354 6009348 611297
36186112 159701682 23370001

CPU: 14.36s     Real: 12.60s    RAM: 2269936KB

time go run go-swiss/swiss.go  
6009354 6009348 611297
1 6 1 --> should be the same as above

CPU: 13.06s     Real: 12.14s    RAM: 1837384KB

report problems

  1. see:

    b := simd.MatchMetadata((*[16]int8)(m), int8(h))

    The group size is 8, but when use simd, it convert to 16.
    If the locate is the last one, programe might be coredump, or result not correct.

  2. compare with https://github.com/rust-lang/hashbrown, group size should be 16.

  3. group count should be one more.
    when the locate is ths last one, and use unalign load, the last one more group can avoid out of range.

Sorry, I think this library still has a lot of bugs and is far from reaching a level that can be actually used.

using `uint64` for number of elements

I used Swiss to test out a utility I had written. Initially, it used native maps but employed a lot of memory. I can confirm that the switch to Swiss memory usage (and the graph from the blog) is correct.

I had to make a change, however. I had more items that the uint32 could hold a reference to. I changed the types within the library to uint64. I should have chosen a different data structure, a separate issue.

Would you accept a PR that allows two swiss versions of uint32 and uint64? I was thinking of being through generics.

Advice: I think there are many ways to make go-swiss better

Give some advice about this lib:

  1. ctrl byte is everything, so no need to set kv when delete.
    those lines are not need:

    swiss/map.go

    Line 141 in c0064a0

    m.groups[g].keys[s] = key

swiss/map.go

Line 190 in c0064a0

m.groups[g].keys[s] = k

swiss/map.go

Line 246 in c0064a0

for i := range m.groups {

2.when init ctrl bytes, we can use simd or unsafe code, not one by one.
see:

swiss/map.go

Line 69 in c0064a0

for i := range m.ctrl {

3.If the group's ctrl bytes are all empty, Iter() can be faster.
see:

swiss/map.go

Line 220 in c0064a0

for n := 0; n < len(groups); n++ {

we can add: if metaMatchEmpty(&m.ctrl[g])==0xffff{continue;}

4.Let ctrl bytes slice address is align to 32, so we can use _mm_load_128() to replace __mm_loadu_128():
see:

swiss/map.go

Line 64 in c0064a0

ctrl: make([]metadata, groups),

alloc more 32 bytes, and use unsafe code to choose a start address of 128bit alignment.

  1. If group count is power of 2, not need to use if to check range.
    see:

    swiss/map.go

    Line 96 in c0064a0

    if g >= uint32(len(m.groups)) {

    g = g & 0xffff // if group count is 65536

6.try groupSize=32, and use avx2 to match 32 bytes each time.

  1. XXHash might be better than golang map hasher.

Can Iter function return a boolean if it's stopped?

Hi,

Currently, I'm developing the concurrent version of swiss-map https://github.com/mhmtszr/concurrent-swiss-map

While implementing Range functions https://github.com/mhmtszr/concurrent-swiss-map/blob/master/concurrent_swiss_map.go#L119 I couldn't stop the other shard because I didn't know the result of the callback functions thus I couldn't stop other shard iterations.

If the Iter function returns a boolean would be great for here what do you think?

Also, I'm open to your feedback about my concurrent-swiss-map implementation too.

Clear() does not compile

I compiled the quickstart program and the compiler issued the error message m.Clear()
"m.Clear undefined (type *swiss.Map[string, int] has no field or method Clear)"

In the github code, Clear() is a method.

I don't really understand: the downloaded version must be different from the github current code.

Ability to import and export maps

I have a long running process that uses swissmaps to store values before they're committed to a datastore. In the event that the process is stopped, I would like to write a gob file with the swissmaps, and upon restart decode the gob file in to the swissmaps.

Currently, this is not possible because swiss.Map doesn't have any exported fields. To remedy this, I have to convert the swissmap to a regular map and then save it to the gob file, and on restart read the gob file maps and put the values into a swissmap.

The performance of swiss Get() is worse than go map at capacity 1~500.

Here is the benchmark.

I found the performance of Get() method is worse than go map get when the capacity is 1~500.

And in my project, call times of Get() method is much more than Put(). And the capacity of map is often less than 500.

So the performance of swiss is not good enough. Wish to improve it.

goos: windows
goarch: amd64
pkg: web/util/listutil
cpu: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
BenchmarkCreateSwissMap
BenchmarkCreateSwissMap-8         499999              3184 ns/op           10128
 B/op          3 allocs/op
BenchmarkCreateGoMap
BenchmarkCreateGoMap-8            249999              5692 ns/op           20504
 B/op          2 allocs/op
BenchmarkSwissMapPut
BenchmarkSwissMapPut-8             35820             37102 ns/op           34624
 B/op         15 allocs/op
BenchmarkGoMapPut
BenchmarkGoMapPut-8                28435             44501 ns/op           43173
 B/op         42 allocs/op
BenchmarkSwissMapGet
BenchmarkSwissMapGet-8            226414              5092 ns/op               0
 B/op          0 allocs/op
BenchmarkGoMapGet
BenchmarkGoMapGet-8               292682              4278 ns/op               0
 B/op          0 allocs/op
BenchmarkSwissMapIterate
BenchmarkSwissMapIterate-8        750000              1616 ns/op               0
 B/op          0 allocs/op
BenchmarkGoMapIterate
BenchmarkGoMapIterate-8           249999              4900 ns/op               0
 B/op          0 allocs/op
PASS
package listutil_test

import (
	"github.com/dolthub/swiss"
	"testing"
)

const capacity = 500

func CreateSwissMap() *swiss.Map[int, int] {
	m := swiss.NewMap[int, int](0)

	for i := 1; i <= capacity; i++ {
		m.Put(i, i)
	}

	return m
}

func CreateMap() map[int]int {
	m := map[int]int{}

	for i := 1; i <= capacity; i++ {
		m[i] = i
	}

	return m
}

func BenchmarkCreateSwissMap(b *testing.B) {
	b.ReportAllocs()
	b.ResetTimer()

	for n := 0; n < b.N; n++ {
		m := swiss.NewMap[int, int](capacity)
		if m != nil {

		}
	}
}

func BenchmarkCreateGoMap(b *testing.B) {
	b.ReportAllocs()
	b.ResetTimer()

	for n := 0; n < b.N; n++ {
		m := make(map[int]int, capacity)
		if m != nil {

		}
	}
}

func BenchmarkSwissMapPut(b *testing.B) {
	b.ReportAllocs()
	b.ResetTimer()

	for n := 0; n < b.N; n++ {
		CreateSwissMap()
	}
}

func BenchmarkGoMapPut(b *testing.B) {
	b.ReportAllocs()
	b.ResetTimer()

	for n := 0; n < b.N; n++ {
		CreateMap()
	}
}

func BenchmarkSwissMapGet(b *testing.B) {
	m := CreateSwissMap()
	size := m.Count()

	b.ReportAllocs()
	b.ResetTimer()

	for n := 0; n < b.N; n++ {
		for i := 1; i <= size; i++ {
			v, ok := m.Get(i)
			if !ok || v == 0 {
				panic(`not found`)
			}
		}
	}
}

func BenchmarkGoMapGet(b *testing.B) {
	m := CreateMap()
	size := len(m)

	b.ReportAllocs()
	b.ResetTimer()

	for n := 0; n < b.N; n++ {
		for i := 1; i <= size; i++ {
			v, ok := m[i]
			if !ok || v == 0 {
				panic(`not found`)
			}
		}
	}
}

func BenchmarkSwissMapIterate(b *testing.B) {
	m := CreateSwissMap()

	b.ReportAllocs()
	b.ResetTimer()

	for n := 0; n < b.N; n++ {
		m.Iter(func(k int, v int) (stop bool) {
			if k <= 0 {
				panic(1)
			}

			if v == 0 {
				panic(1)
			}

			return false
		})
	}
}

func BenchmarkGoMapIterate(b *testing.B) {
	m := CreateMap()

	b.ReportAllocs()
	b.ResetTimer()

	for n := 0; n < b.N; n++ {
		for k, v := range m {
			if k <= 0 {
				panic(1)
			}

			if v == 0 {
				panic(1)
			}
		}
	}
}

Getting Error in Usage of swisstable

I'm getting the following error:

../../go/pkg/mod/github.com/thepudds/[email protected]/match_stub.go:5:6: missing function body

Whenever I try to run the tests.

I took a peek at the package "thepudds/swisstable" and I noticed this file:

match_amd64.s

which looks like it should be providing the stub for AMD64 - but doesn't seem to be. I am running on a Mac with AMD64.

In regards to what I've tried - before I tried running the tests - I ran the following:

go mod tidy

which did install the following:

go: downloading github.com/thepudds/fzgen v0.4.2
go: downloading gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405
go: downloading github.com/sanity-io/litter v1.5.1

Then when I tried to run the tests and I got the same error - I also ran the following:

go install

I did check out the README.md here:
https://github.com/thepudds/swisstable

But, didn't see anything with regards to special instructions for AMD64.

Thanks!

[feature request] generic over the hash function

If you look at the Rust api for std::collections::HashMap it is generic over the hashing function.
This is very useful for people to be able to choose, based on whether they care more about speed than security in the particular functionality they are using the map in.

I think, this would be a good feature to have here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.