Code Monkey home page Code Monkey logo

countminsketch's Introduction

Update: this project has been suspended and superceded by http://rebloom.io

countminsketch: an approximate items counter Redis module

This is trivial implementation of the Count Min Sketch (CMS) from Graham Cormode and S. Muthukrishnan as described in (Approximating Data with the Count-Min Data Structure)[http://dimacs.rutgers.edu/~graham/pubs/papers/cmsoft.pdf].

CMS is a probabilistic data structure that can be used to keep values associated with items, such as the counts of observations of samples in a stream (e.g. IP addresses, URLs, etc...). The counts are only approximated, i.e. there is a probability for error, but the data structure requires considerably less resources compared to storing all samples and their counts.

This implementation employs Redis' Strings for storing the data structure using direct memory access and 16-bit registers.

Quick start guide

  1. Build a Redis server with support for modules.
  2. Build the countminsketch module: make
  3. To load a module, Start Redis with the --loadmodule /path/to/module.so option, add it as a directive to the configuration file or send a MODULE LOAD command.

CMS API

CMS.INCRBY key item value [item value ...]

Time complexity: O(N), where N is the number of items.

Increments the count for item by value.

If key does not exist, the sketch is initialized using a default width of 2000 and a depth of 10 (the defaults provide a %0.01 error at a probability of %0.01).

Reply: Simple String, "OK".

CMS.QUERY key item [item ...]

Time complexity: O(N), where N is the number of items.

Returns the estimated count for item.

Reply: Array of Integers, nil if key or element not found

CMS.MERGE destkey numkeys key [key ...] [WEIGHTS weight [weight ...]]

Time complexity: O(N), where N is the number of keys.

Merges numkeys of key into destkey. By default, destkey holds the sum of the sketches associated with the specified key, where all key should exist and have the same width and depth. Like Redis's builtin ZUNIONSTORE and ZINTERSTORE commands, it is possible to weight each sketch using the WEIGHTS option.

Reply: Simple String, "OK".

CMS.INITBYDIM key width depth

Time complexity: O(1)

Initializes a CMS to the dimensions specified by width and depth.

Reply: Simple String, "OK".

CMS.INITBYERR key error probabilty

Time complexity: O(1)

Initializes a CMS with a maximum error at a given probability.

error and probability should be given as doubles. For example, to specify an error of at most %0.01 at probability of %0.01, use CMS.INITBYERR key 0.001 0.001.

Reply: Simple String, "OK".

CMS.DEBUG key

Time complexity: O(1)

Debugging helper - shows the sketch's properties.

Reply: Bulk Array.

Contributing

Issue reports, pull and feature requests are welcome.

License

AGPLv3 - see LICENSE

Uses Redis Copyright (C) 2009-2012, Salvatore Sanfilippo, BSD 3-Clause License. Uses xxHash Copyright (C) 2012-2014, Yann Collet, BSD 2-Clause License.

countminsketch's People

Contributors

itamarhaber avatar yusaku avatar gkorland avatar dvirsky avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.