bitly / dablooms Goto Github PK
View Code? Open in Web Editor NEWscaling, counting, bloom filter library
License: MIT License
scaling, counting, bloom filter library
License: MIT License
It's a really minor issue, but the correct name is "Bloom filter", not "bloom filter", since Bloom is the inventor's last name.
There is a serious error in free_counting_bloom, and memory leak will happen.
Please modify the code as follows:
don't free bloom->bitmap directly.
call free_bitmap( bloom->bitmap ) instead.
//
int free_counting_bloom(counting_bloom_t *bloom)
{
if (bloom != NULL) {
free(bloom->hashes);
bloom->hashes = NULL;
//
//free(bloom->bitmap);
free_bitmap( bloom->bitmap );
//
free(bloom);
bloom = NULL;
}
return 0;
}
Dablooms is very benefit for understanding the bloom filter. The Makefile of dablooms is very good and professional, but it is a little bit too professional for beginners :)
I believe that dablooms prefer to let people concentrate on the bloom filter itself rather than how to read and spend time on understanding the Makefile.
I wish their would be a very simple version of the Makefile.
function name for "from_file" version doesn't match between header and c file
all untested and probably don't work
could use refactoring
I'm getting a linker error. GCC and Python versions included. Any recommendations?
(bloom)09:26:30 dablooms@master$ make
CC build/dablooms.o
CC build/murmur.o
AR build/libdablooms.a
SO build/libdablooms.1.1.dylib
SYMLNK build/libdablooms.dylib
SYMLNK build/libdablooms.1.dylib
(bloom)09:26:31 dablooms@master$ make pydablooms
PY_BUILD build/python/pydablooms.so
clang: warning: argument unused during compilation: '-mno-fused-madd'
ld: warning: ignoring file build/libdablooms.a, file was built for archive which is not the architecture being linked (i386): build/libdablooms.a
(bloom)09:26:34 dablooms@master$
(bloom)09:27:51 dablooms@master$ gcc --version
i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
(bloom)09:27:53 dablooms@master$ python --version
Python 2.7.2
(bloom)09:27:55 dablooms@master$
Is there currently no way to delete a Go bloom filter?
https://github.com/bitly/dablooms/blob/master/godablooms/dablooms.go#L44
Am I also correct in thinking this memory will never get GC'ed, because it's in C?
trying to make this package so I can use it with golang, keep getting these errors
$ make test
CC build/test_dablooms.o
CC build/dablooms.o
CC build/murmur.o
AR build/libdablooms.a
LD build/test_dablooms
build/libdablooms.a(dablooms.o): In function `counting_bloom_init':
/work/dablooms/src/dablooms.c:226: undefined reference to `log'
/work/dablooms/src/dablooms.c:226: undefined reference to `ceil'
/work/dablooms/src/dablooms.c:227: undefined reference to `log'
/work/dablooms/src/dablooms.c:227: undefined reference to `ceil'
build/libdablooms.a(dablooms.o): In function `new_counting_bloom_from_scale':
/work/dablooms/src/dablooms.c:327: undefined reference to `pow'
collect2: error: ld returned 1 exit status
Makefile:124: recipe for target 'build/test_dablooms' failed
make: *** [build/test_dablooms] Error 1
some googling tells me to add -lm
but make test ALL_LDFLAGS="-lm"
doesn't seem to do the trick.
sorry if this is supposed to be something obvious, but I'm not too familiar with compiling C packages and the README.md doesn't mention anything along these lines.
$ gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
$ uname -a
Linux sakura 4.4.0-53-generic #74-Ubuntu SMP Fri Dec 2 15:59:10 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ make pydablooms
PY_BUILD build/python/pydablooms.so
pydablooms/pydablooms.c:15:11: error: no member named 'ob_type' in 'Dablooms'
self->ob_type->tp_free((PyObject )self);
~~~~ ^
pydablooms/pydablooms.c:50:10: warning: implicit declaration of function 'PyString_Check' is invalid in C99 [-Wimplicit-function-declaration]
if (!PyString_Check(key)) {
^
pydablooms/pydablooms.c:54:32: warning: implicit declaration of function 'PyString_AsString' is invalid in C99 [-Wimplicit-function-declaration]
PyString_AsString(key),
^
pydablooms/pydablooms.c:55:37: warning: implicit declaration of function 'PyString_Size' is invalid in C99 -Wimplicit-function-declarationPyString_Size(key));
^
pydablooms/pydablooms.c:54:32: warning: incompatible integer to pointer conversion passing 'int' to parameter of type 'const char *' [-Wint-conversion]
PyString_AsString(key),
^~~~~~~~~~~~~~~~~~~~~~
src/dablooms.h:76:61: note: passing argument to parameter 's' here
int scaling_bloom_check(scaling_bloom_t *bloom, const char *s, size_t len);
^
pydablooms/pydablooms.c:138:5: warning: suggest braces around initialization of subobject [-Wmissing-braces]
PyObject_HEAD_INIT(NULL)
^~~~~~~~~~~~~~~~~~~~~~~~
/Library/Frameworks/Python.framework/Versions/3.4/include/python3.4m/object.h:86:5: note: expanded from macro 'PyObject_HEAD_INIT'
1, type },
^~~~~~~
pydablooms/pydablooms.c:140:5: warning: incompatible pointer to integer conversion initializing 'Py_ssize_t' (aka 'long') with an expression of type 'char [20]' [-Wint-conversion]
"pydablooms.Dablooms", /tp_name/
^~~~~~~~~~~~~~~~~~~~~
pydablooms/pydablooms.c:143:5: warning: incompatible pointer types initializing 'printfunc' (aka 'int ()(PyObject , FILE *, int)') with an expression of type 'destructor' (aka 'void ()(PyObject _)')
-Wincompatible-pointer-typesDablooms_dealloc, /tp_dealloc/
^~~~~~~~~~~~~~~~~~~~~~~~~~~~
pydablooms/pydablooms.c:150:5: warning: incompatible pointer types initializing 'PyMappingMethods *' with an expression of type 'PySequenceMethods *' [-Wincompatible-pointer-types]
&Dablooms_sequence, /tp_as_sequence/
^~~~~~~~~~~~~~~~~~
pydablooms/pydablooms.c:158:5: warning: incompatible integer to pointer conversion initializing 'const char ' with an expression of type 'unsigned long' [-Wint-conversion]
Py_TPFLAGS_DEFAULT, /_tp_flags/
^~~~~~~~~~~~~~~~~~
/Library/Frameworks/Python.framework/Versions/3.4/include/python3.4m/object.h:643:29: note: expanded from macro 'Py_TPFLAGS_DEFAULT'
^~~
pydablooms/pydablooms.c:159:5: warning: incompatible pointer types initializing 'traverseproc' (aka 'int ()(PyObject *, visitproc, void *)') with an expression of type 'char [17]'
[-Wincompatible-pointer-types]
"Dablooms objects", /tp_doc/
^~~~~~~~~~~~~~~~~~
pydablooms/pydablooms.c:166:5: warning: incompatible pointer types initializing 'struct PyMemberDef *' with an expression of type 'PyMethodDef [7]' [-Wincompatible-pointer-types]
Dablooms_methods, /tp_methods/
^~~~~~~~~~~~~~~~
pydablooms/pydablooms.c:167:5: warning: incompatible pointer types initializing 'struct PyGetSetDef *' with an expression of type 'PyMemberDef [1]' [-Wincompatible-pointer-types]
Dablooms_members, /tp_members/
^~~~~~~~~~~~~~~~
pydablooms/pydablooms.c:174:5: warning: incompatible pointer types initializing 'allocfunc' (aka 'PyObject *()(struct _typeobject , Py_ssize_t)') with an expression of type 'initproc'
(aka 'int ()(PyObject , PyObject *, PyObject *)') -Wincompatible-pointer-typesDablooms_init, /tp_init/
^~~~~~~~~~~~~~~~~~~~~~~
pydablooms/pydablooms.c:176:5: warning: incompatible pointer types initializing 'freefunc' (aka 'void ()(void _)') with an expression of type 'PyObject *(PyTypeObject *, PyObject *, PyObject )'
[-Wincompatible-pointer-types]
Dablooms_new, /_tp_new/
^~~~~~~~~~~~
pydablooms/pydablooms.c:210:9: error: non-void function 'initpydablooms' should return a value [-Wreturn-type]
return;
^
pydablooms/pydablooms.c:213:9: warning: implicit declaration of function 'Py_InitModule3' is invalid in C99 [-Wimplicit-function-declaration]
m = Py_InitModule3("pydablooms", pydablooms_methods, "Dablooms module");
^
pydablooms/pydablooms.c:213:7: warning: incompatible integer to pointer conversion assigning to 'PyObject *' (aka 'struct _object *') from 'int' [-Wint-conversion]
m = Py_InitModule3("pydablooms", pydablooms_methods, "Dablooms module");
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
pydablooms/pydablooms.c:216:9: error: non-void function 'initpydablooms' should return a value [-Wreturn-type]
return;
^
pydablooms/pydablooms.c:219:42: warning: implicit declaration of function 'PyString_FromString' is invalid in C99 [-Wimplicit-function-declaration]
PyModule_AddObject(m, "version", PyString_FromString(dablooms_version()));
^
pydablooms/pydablooms.c:219:42: warning: incompatible integer to pointer conversion passing 'int' to parameter of type 'PyObject _' (aka 'struct object *') [-Wint-conversion]
PyModule_AddObject(m, "version", PyString_FromString(dablooms_version()));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Library/Frameworks/Python.framework/Versions/3.4/include/python3.4m/modsupport.h:47:72: note: passing argument to parameter here
PyAPI_FUNC(int) PyModule_AddObject(PyObject *, const char *, PyObject *);
^
18 warnings and 3 errors generated.
error: command '/usr/bin/clang' failed with exit status 1
make: ** [build/python/pydablooms.so] Error 1
$ uname -a
Darwin .local 13.4.0 Darwin Kernel Version 13.4.0: Wed Dec 17 19:05:52 PST 2014; root:xnu-2422.115.10~1/RELEASE_X86_64 x86_64
make pydablooms
PY_BUILD build/python/pydablooms.so
pydablooms/pydablooms.c: In function ‘Dablooms_dealloc’:
pydablooms/pydablooms.c:15:9: error: ‘Dablooms’ has no member named ‘ob_type’
pydablooms/pydablooms.c: In function ‘contains’:
pydablooms/pydablooms.c:50:5: warning: implicit declaration of function ‘PyString_Check’ [-Wimplicit-function-declaration]
pydablooms/pydablooms.c:54:32: warning: implicit declaration of function ‘PyString_AsString’ [-Wimplicit-function-declaration]
pydablooms/pydablooms.c:55:32: warning: implicit declaration of function ‘PyString_Size’ [-Wimplicit-function-declaration]
pydablooms/pydablooms.c:55:32: warning: passing argument 2 of ‘scaling_bloom_check’ makes pointer from integer without a cast [enabled by default]
src/dablooms.h:76:5: note: expected ‘const char ’ but argument is of type ‘int’
pydablooms/pydablooms.c: At top level:
pydablooms/pydablooms.c:138:5: warning: missing braces around initializer [-Wmissing-braces]
pydablooms/pydablooms.c:138:5: warning: (near initialization for ‘DabloomsType.ob_base.ob_base’) [-Wmissing-braces]
pydablooms/pydablooms.c:140:5: warning: initialization makes integer from pointer without a cast [enabled by default]
pydablooms/pydablooms.c:140:5: warning: (near initialization for ‘DabloomsType.tp_basicsize’) [enabled by default]
pydablooms/pydablooms.c:143:5: warning: initialization from incompatible pointer type [enabled by default]
pydablooms/pydablooms.c:143:5: warning: (near initialization for ‘DabloomsType.tp_print’) [enabled by default]
pydablooms/pydablooms.c:150:5: warning: initialization from incompatible pointer type [enabled by default]
pydablooms/pydablooms.c:150:5: warning: (near initialization for ‘DabloomsType.tp_as_mapping’) [enabled by default]
pydablooms/pydablooms.c:158:5: warning: initialization makes pointer from integer without a cast [enabled by default]
pydablooms/pydablooms.c:158:5: warning: (near initialization for ‘DabloomsType.tp_doc’) [enabled by default]
pydablooms/pydablooms.c:159:5: warning: initialization from incompatible pointer type [enabled by default]
pydablooms/pydablooms.c:159:5: warning: (near initialization for ‘DabloomsType.tp_traverse’) [enabled by default]
pydablooms/pydablooms.c:166:5: warning: initialization from incompatible pointer type [enabled by default]
pydablooms/pydablooms.c:166:5: warning: (near initialization for ‘DabloomsType.tp_members’) [enabled by default]
pydablooms/pydablooms.c:167:5: warning: initialization from incompatible pointer type [enabled by default]
pydablooms/pydablooms.c:167:5: warning: (near initialization for ‘DabloomsType.tp_getset’) [enabled by default]
pydablooms/pydablooms.c:174:5: warning: initialization from incompatible pointer type [enabled by default]
pydablooms/pydablooms.c:174:5: warning: (near initialization for ‘DabloomsType.tp_alloc’) [enabled by default]
pydablooms/pydablooms.c:176:5: warning: initialization from incompatible pointer type [enabled by default]
pydablooms/pydablooms.c:176:5: warning: (near initialization for ‘DabloomsType.tp_free’) [enabled by default]
pydablooms/pydablooms.c: In function ‘initpydablooms’:
pydablooms/pydablooms.c:210:9: warning: ‘return’ with no value, in function returning non-void [-Wreturn-type]
pydablooms/pydablooms.c:213:5: warning: implicit declaration of function ‘Py_InitModule3’ [-Wimplicit-function-declaration]
pydablooms/pydablooms.c:213:7: warning: assignment makes pointer from integer without a cast [enabled by default]
pydablooms/pydablooms.c:216:9: warning: ‘return’ with no value, in function returning non-void [-Wreturn-type]
pydablooms/pydablooms.c:219:5: warning: implicit declaration of function ‘PyString_FromString’ [-Wimplicit-function-declaration]
pydablooms/pydablooms.c:219:5: warning: passing argument 3 of ‘PyModule_AddObject’ makes pointer from integer without a cast [enabled by default]
/usr/include/python3.4m/modsupport.h:47:17: note: expected ‘struct PyObject *’ but argument is of type ‘int’
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
make: ** [build/python/pydablooms.so] Error 1
Linux luna 3.2.0-70-generic #105-Ubuntu SMP Wed Sep 24 19:49:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
If I initialize a new dabloom filter with the incorrect keyword arguments, I get a segfault:
>>> b = pydablooms.Dablooms(capactiy=1000, error_rate=0.05, filepath='/tmp/bloom.bin')
Segmentation fault: 11
The segfault happens in line 15 of the python module.
I have tried to install dablooms under my home directory on Linux,
$ git clone https://github.com/bitly/dablooms.git Cloning into dablooms... remote: Counting objects: 391, done. remote: Compressing objects: 100% (193/193), done. remote: Total 391 (delta 209), reused 363 (delta 193) Receiving objects: 100% (391/391), 86.44 KiB, done. Resolving deltas: 100% (209/209), done. $ cd dablooms/ $ ls godablooms LICENSE Makefile phpdablooms pydablooms README.md src $ make CC build/dablooms.o CC build/murmur.o AR build/libdablooms.a SO build/libdablooms.so.1.0 SYMLNK build/libdablooms.so SYMLNK build/libdablooms.so.1 $ make install PREFIX=$HOME INSTALL /home/pjcock/lib/libdablooms.a INSTALL /home/pjcock/lib/libdablooms.so.1.0 SYMLNK /home/pjcock/lib/libdablooms.so SYMLNK /home/pjcock/lib/libdablooms.so.1 INSTALL /home/pjcock/include/dablooms.h
That all seems fine, but for the Python bindings the prefix seems to be ignored:
$ make install_pydablooms PYTHON=python2.7 PREFIX=$HOME PY_INSTALL /usr/local/lib/python2.7/site-packages/pydablooms.so install: cannot change permissions of `/usr/local/lib/python2.7/site-packages/': Operation not permitted make: *** [/usr/local/lib/python2.7/site-packages/pydablooms.so] Error 1
There appears to be nothing in the Makefile to attempt to add --prefix=$PREFIX or similar to the call to setup.py (as is typically supported, see http://docs.python.org/install/index.html for an example), and furthermore attempting to call pydablooms/setup.py directly with this optional argument fails.
What is the max count in the scalable bloom filter
... and also, this repo is unmaintained (I left the bitly org and lost push privileges)
but I've pushed up a fix to https://github.com/ploxiln/dablooms ( ploxiln@a9cc856 )
While testing your counting bloom filter implementation, I got an ugly segfault (the scaling bloom filter works fine btw).
This is the snippet which segfaults for me:
https://gist.github.com/3444985
Unfortunately I don't get more information out of gdb as the following:
Program received signal SIGSEGV, Segmentation fault.
0x00000000004018bd in counting_bloom_add (bloom=0x604010, s=0x402a00 "test", len=4) at src/dablooms.c:336
336 (*bloom->header->count)++;
undefined reference to pow
it seems to be a problem with the linking and the math lib
In the paper referenced for the bloom filter hashing algorithm used (https://github.com/bitly/dablooms/blob/master/src/dablooms.c#L185) (Kirsch, Mitzenmacher [2006]) - the recommended algorithm is given by:
gi(x)=h1(x)+ih2(x)(mod p)
From my reading it seems that their theoretical estimate for collision probabilities will only hold provided that counts_per_func is prime. It is also notable that 64 bits of entropy from MurmurHash3_x64_128 is discarded without being used. I'm not sure how significant the difference is on average, but it may make sense to either:
a) force counts_per_func to the nearest prime greater than counts_per_func
or
b) use a triple hashing algorithm taking advantage of the unused entropy provided by murmur hash
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.