Code Monkey home page Code Monkey logo

redis-lua-scaling-bloom-filter's Introduction

redis-lua-scaling-bloom-filter

add.lua, cas.lua and check.lua are three lua scripts for a scaling bloom filter for Redis

layer-add.lua and later-check.lua are two lua scripts for a scaling layered bloom filter for Redis

The scripts are to be executed using the EVAL command in Redis.

These scripts will probably not work on Redis cluster since the keys used inside the script aren't all passed as arguments!

The layered filter has a maximum number of 32 layers. You can modify this in the source.

add.lua, cas.lua and layer-add.lua

The add.lua script adds a new element to the filter. It will create the filter when it doesn't exist yet.

cas.lua does a Check And Set, this will not add the element if it already exist. cas.lua will return 0 if the element is added, or 1 if the element was already in the filter. Since we use a scaling filter adding an element using add.lua might cause the element to exist in multiple parts of the filter at the same time. cas.lua prevents this. Using only cas.lua the :count key of the filter will accurately count the number of elements added to the filter. Only using cas.lua will also lower the number of false positives by a small amount (less duplicates in the filter means less bits set).

layer-add.lua does a similar thing to cas.lua since this is necessary for the layer part to work (need to check all the filters in a layer to see if it already exists in the layer). layer-add.lua will return the layer number the element was added to.

These scripts expects 4 arguments.

  1. The base name of the keys to use.
  2. The initial size of the bloom filter (in number of elements).
  3. The probability of false positives.
  4. The element to add to the filter.

For example the following call would add "something" to a filter named test which will initially be able to hold 10000 elements with a probability of false positives of 1%.

eval "add.lua source here" 0 test 10000 0.01 something

check.lua and layer-check.lua

The check.lua and layer-check.lua scripts check if an element is contained in the bloom filter.

layer-check.lua returns the layer the element was found in.

These scripts expects 4 arguments.

  1. The base name of the keys to use.
  2. The initial size of the bloom filter (in number of elements).
  3. The probability of false positives.
  4. The element to check for.

For example the following call would check if "something" is part of the filter named test which will initially be able to hold 10000 elements with a probability of false positives of 1%.

eval "check.lua source here" 0 test 10000 0.01 something

Tests

$ npm install redis srand
$ node add.js
$ node cas.js
$ node check.js
$ # or/and
$ node layer-add.js
$ node layer-check.js

add.js and layer-add.js will add elements to a filter named test and then check if the elements are part of the filter.

check.js and layer-check.js will test random elements against the filter build by add.js or layer-add.js to find the probability of false positives.

Both script assume Redis is running on the default port.

Benchmark

You can run ./benchmark.sh and ./layer-benchmark.sh to see how fast the scripts are.

This script assumes Redis is running on the default port and redis-cli and redis-benchmark are installed.

This is the outputs on my 2.3GHz 2012 MacBook Pro Retina:

add.lua
====== evalsha ab31647b3931a68b3b93a7354a297ed273349d39 0 HSwVBmHECt 1000000 0.01 :rand:000000000000 ======
  200000 requests completed in 8.27 seconds
  20 parallel clients
  3 bytes payload
  keep alive: 1

94.57% <= 1 milliseconds
100.00% <= 2 milliseconds
24175.03 requests per second


check.lua
====== evalsha 437a3b0c6a452b5f7a1f10487974c002d41f4a04 0 HSwVBmHECt 1000000 0.01 :rand:000000000000 ======
  200000 requests completed in 8.54 seconds
  20 parallel clients
  3 bytes payload
  keep alive: 1

92.52% <= 1 milliseconds
100.00% <= 8 milliseconds
23419.20 requests per second


layer-add.lua
====== evalsha 7ae29948e3096dd064c22fcd8b628a5c77394b0c 0 ooPb5enskU 1000000 0.01 :rand:000000000000 ======
  20000 requests completed in 12.61 seconds
  20 parallel clients
  3 bytes payload
  keep alive: 1

55.53% <= 12 milliseconds
75.42% <= 13 milliseconds
83.71% <= 14 milliseconds
91.48% <= 15 milliseconds
97.76% <= 16 milliseconds
99.90% <= 24 milliseconds
100.00% <= 24 milliseconds
1586.04 requests per second


layer-check.lua
====== evalsha c1386438944daedfc4b5c06f79eadb6a83d4b4ea 0 ooPb5enskU 1000000 0.01 :rand:000000000000 ======
  20000 requests completed in 11.13 seconds
  20 parallel clients
  3 bytes payload
  keep alive: 1

0.00% <= 9 milliseconds
74.12% <= 11 milliseconds
80.43% <= 12 milliseconds
83.93% <= 13 milliseconds
97.43% <= 14 milliseconds
99.89% <= 15 milliseconds
100.00% <= 15 milliseconds
1797.59 requests per second

redis-lua-scaling-bloom-filter's People

Contributors

erikdubbelboer avatar glendc avatar jib avatar rhowardiv avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

redis-lua-scaling-bloom-filter's Issues

Missleading documentation

In the readme:
"cas.lua will return 1 if the element is added, 0 otherwise."

But cas return 0 if element not in the filter and 1 otherwise.

cas.lua still increase count for existed item

Just wonder that cas function still increase the filter count when checked item already exists.
I think filter count's only increased if item not exist and we need to add it to the filter.

bloom:count incorrect after reload from dump

I loaded about two billion strings into sharded bloom filters, then dumped the database and loaded it on another machine.

On the first machine the result of:

redis-cli KEYS '*:count' | xargs -IREPLACE redis-cli GET "REPLACE" | paste -s -d+ | bc

was correct, while on the second machine it reported about 43 million.

However, the bloom filters work correctly when tested against a random subset of the two billion strings.

Error with large count: bit offset is not an integer or out of range

I'd like to have a bloom filter with a count of several billion.

However, with values above ~420,000,000 I get this error:
@user_script:81: ERR bit offset is not an integer or out of range

I've implemented a sharding scheme for my own purposes, but perhaps it should be auto-sharding behind the scenes?

Either way, the maximum size and performance implications should probably be documented.

Possible typo in README

Hi, I'm wondering if this line from the README is a typo: "cas.lua does a Check And Set, this will not add the element if it doesn't already exist".

From reviewing the code, it looks like cas.lua will add the element if it doesn't already exist; or, put differently, it will not add the element if it already exists.

Apologies if my understanding is incorrect.

Licence information

Can you explicitly add licencing information to this repository to clarify usage rights?

Unable to add and check for string with special characters

Hi guys,
First of all, I would like to thank you for providing this excellent data structure for redis. Previously, we are storing the alphanumeric string. In that case, it's working perfect. But, now we need to store the alphanumeric string with special characters. However, these string with special character gets stored in bloom filter, but, whenever checking for that particular string, it's always returning false.

  Example, 
      Case 1 : 7c1663ed4f54dbab  (Working)
      Case 2 : mBCPNjXUkOoU7jILeXPUxQ==  (Not working)

Please, help me to resolve this issue or let me know, whether it is the limitation of this bloom filter.

Lua sub bugs

I was using this as a bases for a redis bloom filter (though ended up just roughly borrowing some code and stripping out the scaling filter expansion). Testing showed 10x more false positives than expected all the time. I noticed the hash handling code was broken:

local h = { }
h[0] = tonumber(string.sub(hash, 0 , 8 ), 16)
h[1] = tonumber(string.sub(hash, 8 , 16), 16)
h[2] = tonumber(string.sub(hash, 16, 24), 16)
h[3] = tonumber(string.sub(hash, 24, 32), 16)

See http://www.lua.org/pil/20.html, the strings start at 0 and sub() is inclusive. When I fixed this things worked (no false negatives and the right amount of false positives). E.g:

local h = { }
h[0] = tonumber(string.sub(hash, 1 , 8 ), 16)
h[1] = tonumber(string.sub(hash, 9 , 16), 16)
h[2] = tonumber(string.sub(hash, 17, 24), 16)
h[3] = tonumber(string.sub(hash, 25, 32), 16)

Also maybe the line that sets bits could use ceil() instead of floor()? Not a huge deal though.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.