Code Monkey home page Code Monkey logo

streamingalgorithms's Introduction

License

A Set of Streaming Algorithms. Types include:

  • Bloom Filters

    • Basic
    • Counting
    • Spectral
  • Count-Min Sketch

  • Karp-Papadimitriou-Shenker

  • Misra-Gries

  • Space Saving/Stream Summary

Majority are in C++ (one is in python and Go) and plans are in place to port all to Python, Ruby, Java, Scala and Go.

The C++ implementations use templated classes, and are single header files. To use, simply include the header file - no make files or anything similar.

streamingalgorithms's People

Contributors

bmoscon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

streamingalgorithms's Issues

StreamSummary - save and load for consistency

I love python implementation of StreamSummary.
StreamSummary is usually done on incremental data.
e.g.
In my case, I am doing topK page views analysis via this StreamSummary code on google analytics of let's say past 1 month.
Now tomorrow I get a data of pageviews of tomorrow, I should be able to incrementally apply the script.

Please incorporate the save and load methods in the class.

save can save the necessary parameters and values in JSON.
load can reconstruct bucket and StreamSummary object from the loaded JSON.
This way, we can use it for incremental training.

Thanks

Stream Summary lacks method to access data

stream summary is lacking a method to actually access the data, or otherwise make use of it. should probably provide a way to export the current elements in the stream, and test for existence in the dataset.

various bugs in bloom array

bloom array is based on BitArray, which recently had several critical bugfixes. these need to be ported into bloom_array

assertion failed

running StreamSummary test program with test2.dat (uncommitted, but will commit later), size 3, yields the following assertion failure:

$ ./ss_test 3 test2.dat
ss_test: ../stream_summary.hpp:78: void Bucket::remove(const T&) [with T = std::basic_string]: Assertion `it != obj_map.end()' failed.
Aborted (core dumped)

update

need to convert text to markup

StreamSummary - str representation

The string representation __str as of now gives bucket number and members in that bucket.
Bucket number is not much of importance.

Please modify it to show
Bucket counter (instead of or in addition to) bucket number.

This way the user can know the counter of most frequently occurring items.

improve stream summary (python)

code is very inefficient. Should implement a binary tree for the bucket list so that inserts and searches are more efficient, or at least use binary search to locate elements (and then use a more efficient method to insert)

bad commit

somehow the count min sketch commit failed to actually commit any code... need to recommit the file

static analysis bug in KPS

Checking StreamingAlgorithms/KPS/kps.hpp...
[StreamingAlgorithms/KPS/kps.hpp:76] -> [StreamingAlgorithms/KPS/kps.hpp:79]: (error) Iterator 'map_it' used after element has been erased.

inserts not checked

in stream summary (and perhaps others), insertions into the STL containers can fail. the returned pair needs to be checked. if false, error needs to be thrown

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.