Code Monkey home page Code Monkey logo

flink-dataflow's People

Contributors

aljoscha avatar kl0u avatar mbalassi avatar mxm avatar rmetzger avatar smarthi avatar stephanewen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

flink-dataflow's Issues

Checkpointing of custom sources and sinks

When users use Flink sources or sinks, they can rely on Flink's checkpointing mechanism. However, there is no way to implement checkpointing of custom sources/sinks yet.

What is a default pass-through Transformation in UnboundedFlinkSource?

Hello,
I am trying to execute a statement like this:
PCollection lines = p
.apply(Read.named("StreamingMyData").from(UnboundedFlinkSource.of(kafkaConsumer))).apply(Window.into(FixedWindows.of(Duration.standardSeconds(5)))
.triggering(AfterWatermark.pastEndOfWindow()).withAllowedLateness(Duration.ZERO)
.discardingFiredPanes());
But I really don't want to transform/alter the incoming data. I just want to print or use it "as is" coming from Kafka.
How can I bypass the need for a transform like ("StreamingMyData") and just print the data as its coming in from Kafka?
Thanks so much for your help.

Modules were resolved with conflicting cross-version

When depending on flink-dataflow:

error] Modules were resolved with conflicting cross-version suffixes in:
[error]    org.apache.flink:flink-clients _2.10, <none>
[error]    org.apache.flink:flink-runtime _2.10, <none>
[error]    org.apache.flink:flink-optimizer _2.10, <none>
[error]    org.apache.flink:flink-avro _2.10, <none>

Is there underlying flink version is 1.0-SNAPSHOT
Any hint how to fix?

Add support for Flink and custom sinks.

Currently the Runner has no support for sinks, apart from the console one, which is just for testing and debugging. We have to create a wrapper for the Dataflow sinks and the Flink ones.

Write to jdbc/database

Hi, possibly related to #3, could you provide a hint on where to start to write a jdbc output?
Ideally I'd like to read from a streaming source, do some computation, and write in a db instead of TextIO.

I have a running dataflow pipeline (with output via TextIO), and I saw examples on Flink's DataSet to write to a database via jdbc. I feel I'm close, but I need a hint on how to connect the dots, e.g. how to extract a dataset from a (streaming) pipeline?

Thank you much, E.

Flink-Dataflow with custom sources

I was trying to run a program working using Google Dataflow but with FlinkPipelineRunner. I use custom sources and Sinks to read and write data. It throws an exception saying "java.lang.UnsupportedOperationException: The transform Window.Into() is currently not supported."

Do you support custom sources/sinks or not yet ?

Unable to serialize exception running KafkaWindowedWordCountExample

I'm trying to run KafkaWindowedWordCountExample, but I get an "unable to serialize" exception.
I tried both with Flink 0.10.2 and 0.10.1.

Basically what I did was to create a fat jar including all deps, then run flink:

$ flink run -C file:///path/to/fat.jar -c com.dataartisans.flink.dataflow.examples.streaming.KafkaWindowedWordCountExample /path/to/flink-dataflow-0.2.jar

...

org.apache.flink.client.program.ProgramInvocationException: The main method caused an error.
    at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:512)
    at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:395)
    at org.apache.flink.client.program.Client.runBlocking(Client.java:252)
    at org.apache.flink.client.CliFrontend.executeProgramBlocking(CliFrontend.java:676)
    at org.apache.flink.client.CliFrontend.run(CliFrontend.java:326)
    at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:978)
    at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1028)
Caused by: java.lang.IllegalArgumentException: unable to serialize com.dataartisans.flink.dataflow.translation.wrappers.streaming.io.UnboundedFlinkSource@6e5479cb
    at com.google.cloud.dataflow.sdk.util.SerializableUtils.serializeToByteArray(SerializableUtils.java:51)
    at com.google.cloud.dataflow.sdk.util.SerializableUtils.ensureSerializable(SerializableUtils.java:81)
    at com.google.cloud.dataflow.sdk.io.Read$Unbounded.<init>(Read.java:169)
    at com.google.cloud.dataflow.sdk.io.Read$Unbounded.<init>(Read.java:164)
    at com.google.cloud.dataflow.sdk.io.Read.from(Read.java:66)
    at com.dataartisans.flink.dataflow.examples.streaming.KafkaWindowedWordCountExample.main(KafkaWindowedWordCountExample.java:123)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:497)
    ... 6 more
Caused by: java.io.NotSerializableException: com.google.cloud.dataflow.sdk.options.ProxyInvocationHandler
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1180)
    at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
    at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
    at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
    at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
    at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
    at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
    at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:346)
    at com.google.cloud.dataflow.sdk.util.SerializableUtils.serializeToByteArray(SerializableUtils.java:47)
    ... 16 more

Any idea? Thanks, E.

The transform BigQueryIO.Read is currently not supported.

[main] INFO com.dataartisans.flink.dataflow.FlinkPipelineRunner - Executing pipeline using FlinkPipelineRunner.
[main] INFO com.dataartisans.flink.dataflow.FlinkPipelineRunner - Translating pipeline to Flink program.
enterCompositeTransform- a5f5b96null
|   visitTransform- 40ab1491BigQueryIO.Read
class com.google.cloud.dataflow.sdk.io.BigQueryIO$Read$Bound
java.lang.UnsupportedOperationException: The transform BigQueryIO.Read is currently not supported.
  at com.dataartisans.flink.dataflow.translation.FlinkBatchPipelineTranslator.visitTransform(FlinkBatchPipelineTranslator.java:107)
  at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:219)
  at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:215)
  at com.google.cloud.dataflow.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:102)
  at com.google.cloud.dataflow.sdk.Pipeline.traverseTopologically(Pipeline.java:259)
  at com.dataartisans.flink.dataflow.translation.FlinkPipelineTranslator.translate(FlinkPipelineTranslator.java:32)
  at com.dataartisans.flink.dataflow.FlinkPipelineExecutionEnvironment.translate(FlinkPipelineExecutionEnvironment.java:128)
  at com.dataartisans.flink.dataflow.FlinkPipelineRunner.run(FlinkPipelineRunner.java:115)
  at com.dataartisans.flink.dataflow.FlinkPipelineRunner.run(FlinkPipelineRunner.java:48)

Would be nice to support BQ IO

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.