Code Monkey home page Code Monkey logo

dspa's People

Contributors

danalex97 avatar matthewbrookes avatar

Watchers

 avatar  avatar  avatar

dspa's Issues

Windowing

Todo:

  • add last active timestamp on a post(from likes, comments, replies)
  • operator that outputs all active posts that have timestamps >= time - 12 hours(like a filter)
  • count number of comments, replies, number of people who interact with post in last 30 mins

UTF-16 records

Some records in the data are UTF-16, but csv's StringRecord supports only UTF-8. For now, these records are ignored and the follow error is logged:

Error: Error(Utf8 { pos: Some(Position { byte: 9271, line: 123, record: 122 }), err: Utf8Error { field: 2, valid_up_to: 7 } })

Watermarks & Tests

-- If we do watermarks at producer?

  • Watermarking in the producer is done correctly. We need to explain why in detail.
  • In the current implementation, watermark offers guarantees for delivery of all elements strictly below its timestamp. That is because the stash is non-inclusive.
  • Watermark need to move out of the Buffer operator: If watermarks are eliminated at buffer, then is like we did not have them at all in the next operators, hence no periodic outputs.
  • Rest of operators account for delays and synchronization at the moment. They should not, since the watermark offer all the needed guarantees.

-- Is watermarking at the producer a good idea in the end? It may make more sense to actually just add the periodic extra events without any guarantees and then use the logic we already have? The producer is also only one worker and is not an actual part of the streaming system, which is a bit wierd considering the task 0 formulation...

-- For tests: Input for tests needs to be done via a source operator since probes don't downgrade capabilities(at least in the Timely version we use).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.