Code Monkey home page Code Monkey logo

flink-flux's People

Contributors

dependabot[bot] avatar walterddr avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

aslan001

flink-flux's Issues

DSL Compiler Redesign

See:
https://docs.google.com/document/d/1B0O9FbfwC_XDjoB2mL8Dtmv8VeRxhkGX7090dZGbhJ4/edit#

New compiler needs to be flexible enough to handle different DSLs, but should be clean in terms of how to compiler vertex and job graph

  1. vertex should be compiled via incoming edge and component definition only (#17)
  2. any additional transformation needed is done via downstream vertex (e.g. keyby, repartition, etc)
  3. support reference -> this can either be done in #13 or here.
    [BONUS]. components stored intermediately during compilation in each vertex (e.g. the DataStream coming all the way from source until current) is stored in a more generic manner

Optionally,
Code-Restructure: create a flux-level compilation graph generator in flux-core and remove it from the flink-flux compilation core level (Under discussion whether it should fit in #14 or here)

Restructure compilation framework

Currently the compilation framework assumes many stuff

  1. assumes streamexecutionenv is passed in.
  2. assumes sourcedef always reflectively creates sourcefunction object / sink always creates sinkfunction object.
  3. assumes source/sink always have type inference without error.
  4. assumes operator always convert to OneInputStreamOperator, and always chained with datastream with transform API.
  5. assumes operator always comes with correct type information .
  6. assumes operator always comes with basic type.

This POC works well in simplest cases but lack of flexibility the system is trying to support.

Improvement needs several part.

  1. Source/Sink/Operator reflective compilation needs to have a wrapper on top of Flink objects (e.g. needs wrapping API to get actual Flink runtime objects:
    1. interconnecting component: DataStreamSource / DataStream
    2. actual operating component: SingleInputStreamOperator, AbstractRichFunction, etc.
  2. Separate components (reference enabled) and vertex objects (source/sink/operator)
  3. Create proper type system - handles type inference vs. explicit type definition (considering Avro?)
  4. Consider plugin compiler (e.g. compiling not only operators, but also higher level - related to (1): depends on how the wrapper API works.

[UPDATE]
Reference:
https://docs.google.com/document/d/1B0O9FbfwC_XDjoB2mL8Dtmv8VeRxhkGX7090dZGbhJ4/

Add new interpreter factory

#8 fixes the compiler extendability problem by introducing the compiler factory and the implementation for the basic compiler. However this mixed in the compiler graph implementation as well, which is suppose to be the interpreter factory (the glue part from individual operator/component compiled, to the actual job-graph)

Need to spin of another "interpreter" concept

Separate flux component from flink component

Currently the Flux topology definition makes use of the Flink's internal components (Source, Sink, Operator) as the main component derive.

This is problematic and violation of the DSL extensibility design: having the definition of the most basic component directly tight with the target language make it impossible to extend.

See how streamline differs these two [1][2] components.

Given that this is streamline UI derived (e.g. from json->flux), similar architecture should be provided in our case that goes from flux->flink

Fix type system issue

Follows with the initial commit. The current problem in the connection of direct operator with a DataStream API is that the type system does not work properly - E.g. there's no explicit type information passed over.

In order to support type system, a more formed operator level definition is needed.

  • OperatorDef should contain methods to extract type information of the acceptable input, possible output type information.
  • Possibly create methods to derive output type information during compilation since non-static type information is very likely (given that UDF does not support dynamic type information, this makes the compilation with dynamic type very useful)

DSL redesign

See:
https://docs.google.com/document/d/1L_HMYS1MSkN7CYpPjNjI1QJ9QaWMMTbH53V1J7gs3Y4/edit#

Several components needed:

  1. restructure of the Flux API module (#11)
  2. moved current runner to the new module (#12)
  3. create a ordered edge system needed by Flink (#15)
  4. create a flux-level compilation graph generator in flux-core and remove it from the flink-flux compilation core level (Under discussion whether it should fit in #14 or here)

Additional components can be adjusted but will not be included in this DSL redesign.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.