Update: this project has been suspended and superceded by http://rebloom.io
This is trivial implementation of the Count Min Sketch (CMS) from Graham Cormode and S. Muthukrishnan as described in (Approximating Data with the Count-Min Data Structure)[http://dimacs.rutgers.edu/~graham/pubs/papers/cmsoft.pdf].
CMS is a probabilistic data structure that can be used to keep values associated with items, such as the counts of observations of samples in a stream (e.g. IP addresses, URLs, etc...). The counts are only approximated, i.e. there is a probability for error, but the data structure requires considerably less resources compared to storing all samples and their counts.
This implementation employs Redis' Strings for storing the data structure using direct memory access and 16-bit registers.
- Build a Redis server with support for modules.
- Build the countminsketch module:
make
- To load a module, Start Redis with the
--loadmodule /path/to/module.so
option, add it as a directive to the configuration file or send aMODULE LOAD
command.
Time complexity: O(N), where N is the number of items.
Increments the count for item
by value
.
If key
does not exist, the sketch is initialized using a default width of 2000 and a depth of 10 (the defaults provide a %0.01 error at a probability of %0.01).
Reply: Simple String, "OK".
Time complexity: O(N), where N is the number of items.
Returns the estimated count for item
.
Reply: Array of Integers, nil if key or element not found
Time complexity: O(N), where N is the number of keys.
Merges numkeys
of key
into destkey
. By default, destkey
holds the sum of the sketches associated with the specified key
, where all key
should exist and have the same width and depth. Like Redis's builtin ZUNIONSTORE
and ZINTERSTORE
commands, it is possible to weight each sketch using the WEIGHTS
option.
Reply: Simple String, "OK".
Time complexity: O(1)
Initializes a CMS to the dimensions specified by width
and depth
.
Reply: Simple String, "OK".
Time complexity: O(1)
Initializes a CMS with a maximum error
at a given probability
.
error
and probability
should be given as doubles. For example, to specify an error of at most %0.01 at probability of %0.01, use CMS.INITBYERR key 0.001 0.001
.
Reply: Simple String, "OK".
Time complexity: O(1)
Debugging helper - shows the sketch's properties.
Reply: Bulk Array.
Issue reports, pull and feature requests are welcome.
AGPLv3 - see LICENSE
Uses Redis Copyright (C) 2009-2012, Salvatore Sanfilippo, BSD 3-Clause License. Uses xxHash Copyright (C) 2012-2014, Yann Collet, BSD 2-Clause License.