kelindar / bitmap Goto Github PK

Simple dense bitmap index in Go with binary operators

License: MIT License

Go 29.24% Assembly 63.79% Shell 0.14% C 6.83%

bitmap bitset index

bitmap's Introduction

Hey 👋, I'm Roman!

I love building stuff, currently working as a Principal Engineer in Careem on experimentation, optimization and machine learning platforms. Prior to Careem, I worked as Head of Data Science in AirAsia and as a Principal Engineer in Grab, the super app of South East Asia where we built Product Insights & Experimentation Platforms amongst other things. I also obtained a PhD in Computer Science and Human-Computer Interaction with Trinity College Dublin & IBM Research, and worked as an engineer at various successful companies in Europe, building things like online gaming platforms, autonomous helicopters, or particle/matter collision simulators!

🚀 Distributed Systems I have designed and open-sourced

emitter-io/emitter - high performance, distributed and low latency publish-subscribe platform
kelindar/talaria - distributed, highly available, and low latency time-series database for Presto

📦 Golang Libraries I made to help me in building software faster or explore a certain idea

kelindar/async - concurrency patterns library using promises for Go
kelindar/approx - probabilistic data structures (e.g. morris counting, count-min sketch, topk) in Go
kelindar/binary - generic and fast binary serializer for Go
kelindar/bitmap - zero-allocation dense bitmap index for Go
kelindar/column - columnar in-memory db (structures-of-arrays) with querying and indexing in Go
kelindar/goap - goal oriented action planning library in Go
kelindar/intmap - fast uint32-to-uint32 hash map, about 20-50% faster than Go map
kelindar/tile - 2D grid engine, built with data and cache friendly ways, includes pathfinding and observers
kelindar/timeline - Scheduler of events for near real-time systems in Go
kelindar/evolve - Go implementation of a binary genetic algorithm with random binary crossover & mutation
kelindar/loader - Go library that allows for downloading things from a URL (http, s3, ...)
kelindar/ecs - high-performance entity component system in pure Go
kelindar/event - simple internal event bus for Go applications
kelindar/lua - wrapper around LUA script executor for long-running scripts
kelindar/tcp - simple TCP server in Go
kelindar/process - Go library for process usage retrieval
kelindar/simd - auto-vectorized SIMD math functions for Go
kelindar/smutex - simple sharded mutual exclusion in Go
kelindar/xxrand - XXH3-Based Pseudorandom Number Generator
kelindar/iostream - Simple binary reader and writer for io.Reader/io.Writer streams
kelindar/simplex - Simplex noise in Go
emitter-io/address - Go library providing a set of utilities for dealing with network addressing
emitter-io/stats - high-performance library for stats and metric aggregation
emitter-io/config - library for hierarchical config management

🧪 Experiments in which I tried with various ideas

kelindar/gocc - tool to transpile C to Go assembly using clang & objdump
kelindar/hashbench - benchmarking various hash functions in Go
kelindar/etop - command-line utility for emitter.io cluster status
kelindar/spiral - procedurally generating a spiral galaxy in Go
kelindar/markov - experimenting with markov-chain procgen in Go
kelindar/pmml2lua - trying to convert PMML decision trees into LUA code
kelindar/spike-box - what if we can data-bind js/DOM transparently over the network?
kelindar/misakai-baker - static site generator for C# / .Net people
manycore/dashboard - profiling your parallelized app by categorizing issues and visually browsing them
manycore/matmul - matrix multiplication benchmark suite for parallel performance measurement
manycore/harvester - operating system and hardware performance counter collector utility

🎨 Emitter Demos I have prepared for the project

chat - building a chat with emitter
actor - distributed actor model with emitter
client-server - how to create a client/server application with emitter
platformer - making an online platformer with emitter
retain - how to use message retention in emitter
share - how to use shared subscriptions in emitter
iss - tracking international space station in real-time
presence - demo of the channel presence for emitter

📚 Blogs & Papers I have written in the past

Technical Blog - My random blog posts around experimentation, performance and open source
Ph.D Thesis - Supporting visual diagnosis of performance problems in multi-core and parallel software
SIGCHI'14 Paper - Design considerations for parallel performance tools
IEEE Journal Paper - Parallel Performance Problems on Shared-Memory Multicore Systems: Taxonomy and Observation

bitmap's People

Contributors

Stargazers

Watchers

bitmap's Issues

support `not` operation

is it possible to support not operation? https://en.wikipedia.org/wiki/Bitwise_operation#NOT

range over zeros

i'm intending to use this as a free-map for a filesystem.

is there a reason there's no range function to find zeros?
am i holding it wrong?

sync.Pool usage

Hi,
I was testing different bitmap libraries (by the way as a replacement for hash map)
The results i received for 8 and operations for 500k entities where first bitmap has 10k entities in a row with 1 bit, Clone of the bitmap is used on each operation

goos: linux
goarch: amd64
pkg: bitmap-usage/benchmark/500k-large-groups/kelindar
cpu: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
BenchmarkFindPriceV2_11position            	   26214	     41542 ns/op	   67760 B/op	       8 allocs/op
BenchmarkFindPriceV2_3824position          	   45291	     40293 ns/op	   73904 B/op	       8 allocs/op
BenchmarkFindPriceV2_3824position_OptStats 	   43438	     35956 ns/op	   73952 B/op	      13 allocs/op
BenchmarkFindPriceV2_9701position          	   41796	     36919 ns/op	   65712 B/op	       7 allocs/op
BenchmarkFindPriceV2_MultiplePricesErr     	   44133	     24858 ns/op	   73856 B/op	       6 allocs/op
PASS
ok  	bitmap-usage/benchmark/500k-large-groups/kelindar	51.610s

This is what i received for https://github.com/kelindar/column

goos: linux
goarch: amd64
pkg: bitmap-usage/benchmark/500k-large-groups/kelindar-column
cpu: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
BenchmarkFindPriceV2_11position            	   40641	     25217 ns/op	     216 B/op	       8 allocs/op
BenchmarkFindPriceV2_3824position          	   45633	     25901 ns/op	     216 B/op	       8 allocs/op
BenchmarkFindPriceV2_9701position          	   44162	     26637 ns/op	     217 B/op	       8 allocs/op
BenchmarkFindPriceV2_MultiplePricesErr     	   48385	     24083 ns/op	     177 B/op	       6 allocs/op
PASS
ok  	bitmap-usage/benchmark/500k-large-groups/kelindar-column	107.145s

In your kelindar-column you have fill bitmap.Bitmap // The fill-list and Txn with sync.Pool usage which allow to avoid bitmap allocation. I believe the technique is worth mentioning for potential users of this library

Results with sync.Pool:

goos: linux
goarch: amd64
pkg: bitmap-usage/benchmark/500k-large-groups/kelindar
cpu: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
BenchmarkFindPriceV2_11position            	   79882	     15122 ns/op	     176 B/op	       6 allocs/op
BenchmarkFindPriceV2_3824position          	   85920	     14792 ns/op	     176 B/op	       6 allocs/op
BenchmarkFindPriceV2_3824position_OptStats 	   68959	     15563 ns/op	     225 B/op	      11 allocs/op
BenchmarkFindPriceV2_9701position          	   56557	     19224 ns/op	     177 B/op	       6 allocs/op
BenchmarkFindPriceV2_MultiplePricesErr     	  106196	     11107 ns/op	     128 B/op	       4 allocs/op
PASS
ok  	bitmap-usage/benchmark/500k-large-groups/kelindar	61.455s

if you are curious here is hashmap results - iteration and comparison for 10k entities

goos: linux
goarch: amd64
pkg: bitmap-usage/benchmark/500k-large-groups/map
cpu: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
BenchmarkHttpClientServer_FindPrice            	    5232	    220943 ns/op	    8283 B/op	     112 allocs/op
BenchmarkFindPrice_11position                  	    9979	    118803 ns/op	      64 B/op	       1 allocs/op
BenchmarkFindPrice_3824position                	    9849	    119427 ns/op	      64 B/op	       1 allocs/op
BenchmarkFindPrice_3824position_Optimized      	   25146	     46990 ns/op	      64 B/op	       1 allocs/op
BenchmarkFindPrice_9701position                	    9825	    120534 ns/op	      64 B/op	       1 allocs/op
BenchmarkFindPrice_9701position_Optimized      	    9945	    117033 ns/op	      64 B/op	       1 allocs/op
BenchmarkFindPrice_MultiplePricesErr           	    9799	    119765 ns/op	       0 B/op	       0 allocs/op
BenchmarkFindPrice_MultiplePricesErr_Optimized 	    9814	    119057 ns/op	       0 B/op	       0 allocs/op
PASS
ok  	bitmap-usage/benchmark/500k-large-groups/map	47.843s

feature request: new method `OnesTo(n)` like `CountTo(n)`, `Ones` with a upper boundary?

currently need to call Grow then ones, them remove items >= n

Can we add a new method OnesTo(n) set Ones with a upper boundary?

A impl:

func (dst *Bitmap) OnesTo(n uint32) {
	blkAt := n >> 6

	b := make(Bitmap, blkAt+1)

	for i := uint32(0); i < blkAt; i++ {
		b[i] = math.MaxUint64 // saidly golang's compiler can't optimize this
	}

	b[blkAt] = uint64(math.MaxUint64) >> (64 - n%64)

	*dst = b
}

AVX2. And operation produce wrong result when bitmaps have different sizes. V2

Hi,
one more failing test: bigger, smaller, bigger

func TestAnd_ConsecutiveAnd_DifferentBitmapSizes(t *testing.T) {
	a, b, c := Bitmap{}, Bitmap{}, Bitmap{}
	for i := uint32(0); i < 200; i += 2 {
		a.Set(i)
	}

	for i := uint32(0); i < 100; i += 2 {
		b.Set(i)
	}

	for i := uint32(0); i < 200; i += 2 {
		c.Set(i)
	}

	a.And(b)
	a.And(c)

	for i := uint32(0); i < 200; i++ {
		assert.Equal(t, a.Contains(i), b.Contains(i), "for "+strconv.Itoa(int(i)))
	}
	assert.Equal(t, 50, a.Count())
}

  bitmap_test.go:380: 
        	Error Trace:	bitmap_test.go:380
        	Error:      	Not equal: 
        	            	expected: true
        	            	actual  : false
        	Test:       	TestAnd_DifferentBitmapSizes
        	Messages:   	for 128
    bitmap_test.go:380: 
        	Error Trace:	bitmap_test.go:380
        	Error:      	Not equal: 
        	            	expected: true
        	            	actual  : false
        	Test:       	TestAnd_DifferentBitmapSizes
        	Messages:   	for 130

Why uint64 instead of byte?

That is, what is the reason of having a slice of uint64 instead of a slice of byte?

syntax error

the syntax var books := bitmap.Bitmap in Example Usage should change to var books bitmap.Bitmap

Kelindar fails under highload. 1-3/400k requests

Hi,
After upgrade from v1.1.4 to v1.2.1 I got errors under highload tests

Test Case:

git clone https://github.com/marniks7/bitmap-usage/
cd bitmap-usage
git checkout update-dependencies
go test ./runner/... -run PerformanceWrkExperiments -covermode=atomic -short -test.v -count=1

AR: (wrk, client side), 3 errors per 390k+ requests

  391214 requests in 10.10s, 76.38MB read
  Non-2xx or 3xx responses: 3

ER: Non-2xx or 3xx responses should be absent

AR: (errors on application side)

2022-04-10T11:01:28-04:00 ERR Unable to find price id error="unable find price, no default and >1 found"
2022-04-10T11:01:32-04:00 ERR Unable to find price id error="unable find price, no default and >1 found"
2022-04-10T11:01:34-04:00 ERR Unable to find price id error="unable find price, no default and >1 found"

ER: there should be 0 errors on application side

Code:
https://github.com/marniks7/bitmap-usage/blob/update-dependencies/index-kelindar/searchv2.go
Code for breakpoint:

 else {
		return 0, ErrUnableToFindPriceMoreThenOneNoDefault
	}

List of used operations:

tx.index.And
tx.index.Count()
bm.Clone

Debug:

KELINDAR32=true FIBER=true go run main.go
make wrk-kelindar-t2-c20

FromBytes with binary.ByteOrder

I have a bitmap over some protocol encoded as BigEndian []byte, is there a way to create bitmap from []byte with endian specified?

No benefit with new bulk approach for `And` ?

Hi, based on the release v1.2.0 i have tried to use bulk approach for And operation for 1 + 6 extra bitmaps.
Usage: index.And(bms[0], bms[1:]...)
I do understand that there is 1 + 3 extra handled in bulk, and all next extra are handled one-by-one, but still

Timings for bitmap.And: 5.31s before vs 5.38s after (though this vary). I did few other tests and I don't see benefits.

Another test:

name              old time/op    new time/op    delta
FindPriceV2_Real    10.4µs ± 3%    10.3µs ± 3%     ~     (p=1.000 n=5+5)

name              old alloc/op   new alloc/op   delta
FindPriceV2_Real      361B ± 0%      537B ± 0%  +48.75%  (p=0.008 n=5+5)

name              old allocs/op  new allocs/op  delta
FindPriceV2_Real      8.00 ± 0%      9.00 ± 0%  +12.50%  (p=0.008 n=5+5)

Non-bulk approach:

Bulk approach:

AVX2. And operation produce wrong result when bitmaps have different sizes

Hi there,
a.And(b) and b.And(a) should be commutative when bitmap sizes are different, but they are not.

Here is the test:

func TestAnd_DifferentBitmapSizes(t *testing.T) {
	a, b := Bitmap{}, Bitmap{}
	for i := uint32(0); i < 100; i += 2 {
		a.Set(i)
	}

	for i := uint32(0); i < 200; i += 2 {
		b.Set(i)
	}

	a.And(b)
	cnt1 := a.Count()

	a, b = Bitmap{}, Bitmap{}
	for i := uint32(0); i < 100; i += 2 {
		a.Set(i)
	}

	for i := uint32(0); i < 200; i += 2 {
		b.Set(i)
	}
	b.And(a)
	cnt2 := b.Count()

	assert.Equal(t, cnt1, cnt2)
}

Expected :50
Actual   :86

Another test variation

func TestAnd_DifferentBitmapSizes(t *testing.T) {
	a, b := Bitmap{}, Bitmap{}
	for i := uint32(0); i < 100; i += 2 {
		a.Set(i)
	}

	for i := uint32(0); i < 200; i += 2 {
		b.Set(i)
	}

	a.And(b)

	c, d := Bitmap{}, Bitmap{}
	for i := uint32(0); i < 100; i += 2 {
		c.Set(i)
	}
	for i := uint32(0); i < 200; i += 2 {
		d.Set(i)
	}
	d.And(c)

	for i := uint32(0); i < 200; i++ {
		assert.Equal(t, a.Contains(i), d.Contains(i), "for " + strconv.Itoa(int(i)))
	}
	assert.Equal(t, 50, a.Count())
	assert.Equal(t, 50, d.Count())
}

=== RUN   TestAnd_DifferentBitmapSizes
    bitmap_test.go:178: 
        	Error Trace:	bitmap_test.go:178
        	Error:      	Not equal: 
        	            	expected: false
        	            	actual  : true
        	Test:       	TestAnd_DifferentBitmapSizes
        	Messages:   	for 128
    bitmap_test.go:178: 
        	Error Trace:	bitmap_test.go:178
        	Error:      	Not equal: 
        	            	expected: false
        	            	actual  : true
        	Test:       	TestAnd_DifferentBitmapSizes
        	Messages:   	for 130
    bitmap_test.go:178: 
        	Error Trace:	bitmap_test.go:178
        	Error:      	Not equal: 
        	            	expected: false
        	            	actual  : true
        	Test:       	TestAnd_DifferentBitmapSizes
        	Messages:   	for 132
    bitmap_test.go:178: 
        	Error Trace:	bitmap_test.go:178
        	Error:      	Not equal: 
        	            	expected: false
        	            	actual  : true
        	Test:       	TestAnd_DifferentBitmapSizes
        	Messages:   	for 134

....etc....

Virtual memory leak while using bitmap

When i am using bitmap and as we set the bits in bitmap virtual memory shoots up (verified using pmap command) , but after clearing the bitmap using "bitmap.clear" virtual memory does not releases. I want to know the method to clear bitmap along with all it's instance and used bytes in virtual memory.Memory allocation before using bitmap and after clearing bitmap should be same. Can you pls help me in achieving the same ..

FromBytes() has bug about length of buffer

// FromBytes reads a bitmap from a byte buffer without copying the buffer.
func FromBytes(buffer []byte) (out Bitmap) {
	hdr := (*reflect.SliceHeader)(unsafe.Pointer(&out))
	hdr.Len = len(buffer) >> 3
	hdr.Cap = hdr.Len
	hdr.Data = uintptr(unsafe.Pointer(&(buffer)[0]))
	return out
}

While length of buffer less than 8, hdr.Len is always 0. And length of buffer must be a multiple of 8.

Give credit to the original author?

Thanks for creating a usable Go bitmap index library! One thing I noticed while I was reading through the code: parts of the heavy lifting (AVX code generation) seems similar to code from a talk given by @mkevac (https://m.youtube.com/watch?v=WvlUH6MjUuI) and his examples https://github.com/mkevac/gopherconrussia2019/tree/master/simplesimd - with many identical lines (https://github.com/mkevac/gopherconrussia2019/blob/master/simplesimd/asm.go#L106 vs https://github.com/kelindar/bitmap/blob/main/generate/amd64.go#L104). If it was actually copied I think it would be fair to give the original author some credit, especially given that he has not put the code under any official licence and the files in this repo all contain Copyright (c) Roman Atachiants 😏. If it wasn't mea culpa.

Bitmap.Or(shortBitmap, longBitmap) result length same as the first one

When using multiple bitmaps as Bitmap.Or() parameters, the length of destination Bitmap will be same to the first bitmap, if first one is shorter than others, the result will be truncate to start bytes of the result.

Here is sample code to reproduce, see only in last case, call Or() one by one, get the right result.

package main

import (
	"fmt"
	"github.com/kelindar/bitmap"
)

func main() {
	bm := new(bitmap.Bitmap)
	bm1 := bitmap.FromBytes([]byte{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01})
	bm2 := bitmap.FromBytes([]byte{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01})
	bm3 := bitmap.FromBytes([]byte{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01})
	bm.Or(bm1, bm2, bm3)
	fmt.Println("Excepted:3, got:", bm.Count())
	bm.Clear()
	bm.Or(bm1, bm3, bm2)
	fmt.Println("Excepted:3, got:", bm.Count())
	bm.Clear()
	bm.Or(bm3, bm2, bm1)
	fmt.Println("Excepted:3, got:", bm.Count())
	fmt.Printf("bm:%+v\n", bm.ToBytes())
	bm.Clear()
	bm.Or(bm3, bm1, bm2)
	fmt.Println("Excepted:3, got:", bm.Count())
	fmt.Printf("bm:%+v\n", bm.ToBytes())
	bm.Clear()
	bm.Or(bm2, bm3, bm1)
	fmt.Println("Excepted:3, got:", bm.Count())
	bm.Clear()
	bm.Or(bm2, bm1, bm3)
	fmt.Println("Excepted:3, got:", bm.Count())
	bm.Clear()
	bm.Or(bm3)
	bm.Or(bm2)
	bm.Or(bm1)
	fmt.Println("Excepted:3, got:", bm.Count())
}

outputs,

Excepted:3, got: 1
Excepted:3, got: 1
Excepted:3, got: 32
bm:[0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 69 120 99 101 112 116 101 101]
Excepted:3, got: 32
bm:[0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 69 120 99 101 112 116 101 101]
Excepted:3, got: 2
Excepted:3, got: 2
Excepted:3, got: 3

Affected lib versions: v1.2.1, v1.4.1 (I found this issue on v1.2.1 and reproduced on v1.4.1)
Affected Go versions: go1.17.6 darwin/amd64, go1.18.5 darwin/amd64
Affected OS: macOS Monterey 12.5.1

kelindar / bitmap Goto Github PK

bitmap's Introduction

Hey 👋, I'm Roman!

bitmap's People

Contributors

Stargazers

Watchers

Forkers

bitmap's Issues

Recommend Projects

Recommend Topics

Recommend Org