Code Monkey home page Code Monkey logo

bitmap's Introduction

Hey 👋, I'm Roman!

I love building stuff, currently working as a Principal Engineer in Careem on experimentation, optimization and machine learning platforms. Prior to Careem, I worked as Head of Data Science in AirAsia and as a Principal Engineer in Grab, the super app of South East Asia where we built Product Insights & Experimentation Platforms amongst other things. I also obtained a PhD in Computer Science and Human-Computer Interaction with Trinity College Dublin & IBM Research, and worked as an engineer at various successful companies in Europe, building things like online gaming platforms, autonomous helicopters, or particle/matter collision simulators!


🚀 Distributed Systems I have designed and open-sourced

  • emitter-io/emitter - high performance, distributed and low latency publish-subscribe platform
  • kelindar/talaria - distributed, highly available, and low latency time-series database for Presto

📦 Golang Libraries I made to help me in building software faster or explore a certain idea

🧪 Experiments in which I tried with various ideas

🎨 Emitter Demos I have prepared for the project

  • chat - building a chat with emitter
  • actor - distributed actor model with emitter
  • client-server - how to create a client/server application with emitter
  • platformer - making an online platformer with emitter
  • retain - how to use message retention in emitter
  • share - how to use shared subscriptions in emitter
  • iss - tracking international space station in real-time
  • presence - demo of the channel presence for emitter

📚 Blogs & Papers I have written in the past

  • Technical Blog - My random blog posts around experimentation, performance and open source
  • Ph.D Thesis - Supporting visual diagnosis of performance problems in multi-core and parallel software
  • SIGCHI'14 Paper - Design considerations for parallel performance tools
  • IEEE Journal Paper - Parallel Performance Problems on Shared-Memory Multicore Systems: Taxonomy and Observation

visitors

bitmap's People

Contributors

developerdave17 avatar dtchanpura avatar florimond avatar hasheddan avatar kelindar avatar marniks7 avatar mfat7y avatar mhr3 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bitmap's Issues

AVX2. And operation produce wrong result when bitmaps have different sizes. V2

Hi,
one more failing test: bigger, smaller, bigger

func TestAnd_ConsecutiveAnd_DifferentBitmapSizes(t *testing.T) {
	a, b, c := Bitmap{}, Bitmap{}, Bitmap{}
	for i := uint32(0); i < 200; i += 2 {
		a.Set(i)
	}

	for i := uint32(0); i < 100; i += 2 {
		b.Set(i)
	}

	for i := uint32(0); i < 200; i += 2 {
		c.Set(i)
	}

	a.And(b)
	a.And(c)

	for i := uint32(0); i < 200; i++ {
		assert.Equal(t, a.Contains(i), b.Contains(i), "for "+strconv.Itoa(int(i)))
	}
	assert.Equal(t, 50, a.Count())
}
  bitmap_test.go:380: 
        	Error Trace:	bitmap_test.go:380
        	Error:      	Not equal: 
        	            	expected: true
        	            	actual  : false
        	Test:       	TestAnd_DifferentBitmapSizes
        	Messages:   	for 128
    bitmap_test.go:380: 
        	Error Trace:	bitmap_test.go:380
        	Error:      	Not equal: 
        	            	expected: true
        	            	actual  : false
        	Test:       	TestAnd_DifferentBitmapSizes
        	Messages:   	for 130

Bitmap.Or(shortBitmap, longBitmap) result length same as the first one

When using multiple bitmaps as Bitmap.Or() parameters, the length of destination Bitmap will be same to the first bitmap, if first one is shorter than others, the result will be truncate to start bytes of the result.

Here is sample code to reproduce, see only in last case, call Or() one by one, get the right result.

package main

import (
	"fmt"
	"github.com/kelindar/bitmap"
)

func main() {
	bm := new(bitmap.Bitmap)
	bm1 := bitmap.FromBytes([]byte{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01})
	bm2 := bitmap.FromBytes([]byte{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01})
	bm3 := bitmap.FromBytes([]byte{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01})
	bm.Or(bm1, bm2, bm3)
	fmt.Println("Excepted:3, got:", bm.Count())
	bm.Clear()
	bm.Or(bm1, bm3, bm2)
	fmt.Println("Excepted:3, got:", bm.Count())
	bm.Clear()
	bm.Or(bm3, bm2, bm1)
	fmt.Println("Excepted:3, got:", bm.Count())
	fmt.Printf("bm:%+v\n", bm.ToBytes())
	bm.Clear()
	bm.Or(bm3, bm1, bm2)
	fmt.Println("Excepted:3, got:", bm.Count())
	fmt.Printf("bm:%+v\n", bm.ToBytes())
	bm.Clear()
	bm.Or(bm2, bm3, bm1)
	fmt.Println("Excepted:3, got:", bm.Count())
	bm.Clear()
	bm.Or(bm2, bm1, bm3)
	fmt.Println("Excepted:3, got:", bm.Count())
	bm.Clear()
	bm.Or(bm3)
	bm.Or(bm2)
	bm.Or(bm1)
	fmt.Println("Excepted:3, got:", bm.Count())
}

outputs,

Excepted:3, got: 1
Excepted:3, got: 1
Excepted:3, got: 32
bm:[0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 69 120 99 101 112 116 101 101]
Excepted:3, got: 32
bm:[0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 69 120 99 101 112 116 101 101]
Excepted:3, got: 2
Excepted:3, got: 2
Excepted:3, got: 3

Affected lib versions: v1.2.1, v1.4.1 (I found this issue on v1.2.1 and reproduced on v1.4.1)
Affected Go versions: go1.17.6 darwin/amd64, go1.18.5 darwin/amd64
Affected OS: macOS Monterey 12.5.1

AVX2. And operation produce wrong result when bitmaps have different sizes

Hi there,
a.And(b) and b.And(a) should be commutative when bitmap sizes are different, but they are not.

Here is the test:

func TestAnd_DifferentBitmapSizes(t *testing.T) {
	a, b := Bitmap{}, Bitmap{}
	for i := uint32(0); i < 100; i += 2 {
		a.Set(i)
	}

	for i := uint32(0); i < 200; i += 2 {
		b.Set(i)
	}

	a.And(b)
	cnt1 := a.Count()

	a, b = Bitmap{}, Bitmap{}
	for i := uint32(0); i < 100; i += 2 {
		a.Set(i)
	}

	for i := uint32(0); i < 200; i += 2 {
		b.Set(i)
	}
	b.And(a)
	cnt2 := b.Count()

	assert.Equal(t, cnt1, cnt2)
}
Expected :50
Actual   :86

Another test variation

func TestAnd_DifferentBitmapSizes(t *testing.T) {
	a, b := Bitmap{}, Bitmap{}
	for i := uint32(0); i < 100; i += 2 {
		a.Set(i)
	}

	for i := uint32(0); i < 200; i += 2 {
		b.Set(i)
	}

	a.And(b)

	c, d := Bitmap{}, Bitmap{}
	for i := uint32(0); i < 100; i += 2 {
		c.Set(i)
	}
	for i := uint32(0); i < 200; i += 2 {
		d.Set(i)
	}
	d.And(c)

	for i := uint32(0); i < 200; i++ {
		assert.Equal(t, a.Contains(i), d.Contains(i), "for " + strconv.Itoa(int(i)))
	}
	assert.Equal(t, 50, a.Count())
	assert.Equal(t, 50, d.Count())
}
=== RUN   TestAnd_DifferentBitmapSizes
    bitmap_test.go:178: 
        	Error Trace:	bitmap_test.go:178
        	Error:      	Not equal: 
        	            	expected: false
        	            	actual  : true
        	Test:       	TestAnd_DifferentBitmapSizes
        	Messages:   	for 128
    bitmap_test.go:178: 
        	Error Trace:	bitmap_test.go:178
        	Error:      	Not equal: 
        	            	expected: false
        	            	actual  : true
        	Test:       	TestAnd_DifferentBitmapSizes
        	Messages:   	for 130
    bitmap_test.go:178: 
        	Error Trace:	bitmap_test.go:178
        	Error:      	Not equal: 
        	            	expected: false
        	            	actual  : true
        	Test:       	TestAnd_DifferentBitmapSizes
        	Messages:   	for 132
    bitmap_test.go:178: 
        	Error Trace:	bitmap_test.go:178
        	Error:      	Not equal: 
        	            	expected: false
        	            	actual  : true
        	Test:       	TestAnd_DifferentBitmapSizes
        	Messages:   	for 134

....etc....

Kelindar fails under highload. 1-3/400k requests

Hi,
After upgrade from v1.1.4 to v1.2.1 I got errors under highload tests

Test Case:

git clone https://github.com/marniks7/bitmap-usage/
cd bitmap-usage
git checkout update-dependencies
go test ./runner/... -run PerformanceWrkExperiments -covermode=atomic -short -test.v -count=1

AR: (wrk, client side), 3 errors per 390k+ requests

  391214 requests in 10.10s, 76.38MB read
  Non-2xx or 3xx responses: 3

ER: Non-2xx or 3xx responses should be absent

AR: (errors on application side)

2022-04-10T11:01:28-04:00 ERR Unable to find price id error="unable find price, no default and >1 found"
2022-04-10T11:01:32-04:00 ERR Unable to find price id error="unable find price, no default and >1 found"
2022-04-10T11:01:34-04:00 ERR Unable to find price id error="unable find price, no default and >1 found"

ER: there should be 0 errors on application side

Code:
https://github.com/marniks7/bitmap-usage/blob/update-dependencies/index-kelindar/searchv2.go
Code for breakpoint:

 else {
		return 0, ErrUnableToFindPriceMoreThenOneNoDefault
	}

List of used operations:

tx.index.And
tx.index.Count()
bm.Clone

Debug:

  1. KELINDAR32=true FIBER=true go run main.go
  2. make wrk-kelindar-t2-c20

sync.Pool usage

Hi,
I was testing different bitmap libraries (by the way as a replacement for hash map)
The results i received for 8 and operations for 500k entities where first bitmap has 10k entities in a row with 1 bit, Clone of the bitmap is used on each operation

goos: linux
goarch: amd64
pkg: bitmap-usage/benchmark/500k-large-groups/kelindar
cpu: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
BenchmarkFindPriceV2_11position            	   26214	     41542 ns/op	   67760 B/op	       8 allocs/op
BenchmarkFindPriceV2_3824position          	   45291	     40293 ns/op	   73904 B/op	       8 allocs/op
BenchmarkFindPriceV2_3824position_OptStats 	   43438	     35956 ns/op	   73952 B/op	      13 allocs/op
BenchmarkFindPriceV2_9701position          	   41796	     36919 ns/op	   65712 B/op	       7 allocs/op
BenchmarkFindPriceV2_MultiplePricesErr     	   44133	     24858 ns/op	   73856 B/op	       6 allocs/op
PASS
ok  	bitmap-usage/benchmark/500k-large-groups/kelindar	51.610s

This is what i received for https://github.com/kelindar/column

goos: linux
goarch: amd64
pkg: bitmap-usage/benchmark/500k-large-groups/kelindar-column
cpu: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
BenchmarkFindPriceV2_11position            	   40641	     25217 ns/op	     216 B/op	       8 allocs/op
BenchmarkFindPriceV2_3824position          	   45633	     25901 ns/op	     216 B/op	       8 allocs/op
BenchmarkFindPriceV2_9701position          	   44162	     26637 ns/op	     217 B/op	       8 allocs/op
BenchmarkFindPriceV2_MultiplePricesErr     	   48385	     24083 ns/op	     177 B/op	       6 allocs/op
PASS
ok  	bitmap-usage/benchmark/500k-large-groups/kelindar-column	107.145s

In your kelindar-column you have fill bitmap.Bitmap // The fill-list and Txn with sync.Pool usage which allow to avoid bitmap allocation. I believe the technique is worth mentioning for potential users of this library

Results with sync.Pool:

goos: linux
goarch: amd64
pkg: bitmap-usage/benchmark/500k-large-groups/kelindar
cpu: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
BenchmarkFindPriceV2_11position            	   79882	     15122 ns/op	     176 B/op	       6 allocs/op
BenchmarkFindPriceV2_3824position          	   85920	     14792 ns/op	     176 B/op	       6 allocs/op
BenchmarkFindPriceV2_3824position_OptStats 	   68959	     15563 ns/op	     225 B/op	      11 allocs/op
BenchmarkFindPriceV2_9701position          	   56557	     19224 ns/op	     177 B/op	       6 allocs/op
BenchmarkFindPriceV2_MultiplePricesErr     	  106196	     11107 ns/op	     128 B/op	       4 allocs/op
PASS
ok  	bitmap-usage/benchmark/500k-large-groups/kelindar	61.455s

if you are curious here is hashmap results - iteration and comparison for 10k entities

goos: linux
goarch: amd64
pkg: bitmap-usage/benchmark/500k-large-groups/map
cpu: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
BenchmarkHttpClientServer_FindPrice            	    5232	    220943 ns/op	    8283 B/op	     112 allocs/op
BenchmarkFindPrice_11position                  	    9979	    118803 ns/op	      64 B/op	       1 allocs/op
BenchmarkFindPrice_3824position                	    9849	    119427 ns/op	      64 B/op	       1 allocs/op
BenchmarkFindPrice_3824position_Optimized      	   25146	     46990 ns/op	      64 B/op	       1 allocs/op
BenchmarkFindPrice_9701position                	    9825	    120534 ns/op	      64 B/op	       1 allocs/op
BenchmarkFindPrice_9701position_Optimized      	    9945	    117033 ns/op	      64 B/op	       1 allocs/op
BenchmarkFindPrice_MultiplePricesErr           	    9799	    119765 ns/op	       0 B/op	       0 allocs/op
BenchmarkFindPrice_MultiplePricesErr_Optimized 	    9814	    119057 ns/op	       0 B/op	       0 allocs/op
PASS
ok  	bitmap-usage/benchmark/500k-large-groups/map	47.843s

No benefit with new bulk approach for `And` ?

Hi, based on the release v1.2.0 i have tried to use bulk approach for And operation for 1 + 6 extra bitmaps.
Usage: index.And(bms[0], bms[1:]...)
I do understand that there is 1 + 3 extra handled in bulk, and all next extra are handled one-by-one, but still

Timings for bitmap.And: 5.31s before vs 5.38s after (though this vary). I did few other tests and I don't see benefits.

Another test:

name              old time/op    new time/op    delta
FindPriceV2_Real    10.4µs ± 3%    10.3µs ± 3%     ~     (p=1.000 n=5+5)

name              old alloc/op   new alloc/op   delta
FindPriceV2_Real      361B ± 0%      537B ± 0%  +48.75%  (p=0.008 n=5+5)

name              old allocs/op  new allocs/op  delta
FindPriceV2_Real      8.00 ± 0%      9.00 ± 0%  +12.50%  (p=0.008 n=5+5)

Non-bulk approach:
image

Bulk approach:
image

syntax error

the syntax var books := bitmap.Bitmap in Example Usage should change to var books bitmap.Bitmap

Virtual memory leak while using bitmap

When i am using bitmap and as we set the bits in bitmap virtual memory shoots up (verified using pmap command) , but after clearing the bitmap using "bitmap.clear" virtual memory does not releases. I want to know the method to clear bitmap along with all it's instance and used bytes in virtual memory.Memory allocation before using bitmap and after clearing bitmap should be same. Can you pls help me in achieving the same ..

Give credit to the original author?

Thanks for creating a usable Go bitmap index library! One thing I noticed while I was reading through the code: parts of the heavy lifting (AVX code generation) seems similar to code from a talk given by @mkevac (https://m.youtube.com/watch?v=WvlUH6MjUuI) and his examples https://github.com/mkevac/gopherconrussia2019/tree/master/simplesimd - with many identical lines (https://github.com/mkevac/gopherconrussia2019/blob/master/simplesimd/asm.go#L106 vs https://github.com/kelindar/bitmap/blob/main/generate/amd64.go#L104). If it was actually copied I think it would be fair to give the original author some credit, especially given that he has not put the code under any official licence and the files in this repo all contain Copyright (c) Roman Atachiants 😏. If it wasn't mea culpa.

FromBytes() has bug about length of buffer

// FromBytes reads a bitmap from a byte buffer without copying the buffer.
func FromBytes(buffer []byte) (out Bitmap) {
	hdr := (*reflect.SliceHeader)(unsafe.Pointer(&out))
	hdr.Len = len(buffer) >> 3
	hdr.Cap = hdr.Len
	hdr.Data = uintptr(unsafe.Pointer(&(buffer)[0]))
	return out
}

While length of buffer less than 8, hdr.Len is always 0. And length of buffer must be a multiple of 8.

range over zeros

i'm intending to use this as a free-map for a filesystem.

is there a reason there's no range function to find zeros?
am i holding it wrong?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.