dataartisans / flink-dataflow Goto Github PK

View Code? Open in Web Editor NEW

87.0 87.0 29.0 531 KB

Google Dataflow Runner for Apache Flink™ (deprecated; please use the up-to-date Beam Runner)

Home Page: https://github.com/apache/incubator-beam/

License: Apache License 2.0

Java 100.00%

flink-dataflow's People

Contributors

Stargazers

Watchers

flink-dataflow's Issues

Checkpointing of custom sources and sinks

When users use Flink sources or sinks, they can rely on Flink's checkpointing mechanism. However, there is no way to implement checkpointing of custom sources/sinks yet.

What is a default pass-through Transformation in UnboundedFlinkSource?

Hello,
I am trying to execute a statement like this:
PCollection lines = p
.apply(Read.named("StreamingMyData").from(UnboundedFlinkSource.of(kafkaConsumer))).apply(Window.into(FixedWindows.of(Duration.standardSeconds(5)))
.triggering(AfterWatermark.pastEndOfWindow()).withAllowedLateness(Duration.ZERO)
.discardingFiredPanes());
But I really don't want to transform/alter the incoming data. I just want to print or use it "as is" coming from Kafka.
How can I bypass the need for a transform like ("StreamingMyData") and just print the data as its coming in from Kafka?
Thanks so much for your help.

Modules were resolved with conflicting cross-version

When depending on flink-dataflow:

error] Modules were resolved with conflicting cross-version suffixes in:
[error]    org.apache.flink:flink-clients _2.10, <none>
[error]    org.apache.flink:flink-runtime _2.10, <none>
[error]    org.apache.flink:flink-optimizer _2.10, <none>
[error]    org.apache.flink:flink-avro _2.10, <none>

Is there underlying flink version is 1.0-SNAPSHOT
Any hint how to fix?

Add support for Flink and custom sinks.

Currently the Runner has no support for sinks, apart from the console one, which is just for testing and debugging. We have to create a wrapper for the Dataflow sinks and the Flink ones.

Deploy Flink-dataflow to maven central / sonatype snapshots.

I think its quite inconvenient for users to build the project themselves.

Include PipelineOptions in ProcessContext of FlinkGroupAlsoByWindowWrapper

This would let DoFns access the user-defined pipeline options.

Write to jdbc/database

Hi, possibly related to #3, could you provide a hint on where to start to write a jdbc output?
Ideally I'd like to read from a streaming source, do some computation, and write in a db instead of TextIO.

I have a running dataflow pipeline (with output via TextIO), and I saw examples on Flink's DataSet to write to a database via jdbc. I feel I'm close, but I need a hint on how to connect the dots, e.g. how to extract a dataset from a (streaming) pipeline?

Thank you much, E.

Support pubsub IO

It would be really nice to support PubSubIO as a data source.

Flink-Dataflow with custom sources

I was trying to run a program working using Google Dataflow but with FlinkPipelineRunner. I use custom sources and Sinks to read and write data. It throws an exception saying "java.lang.UnsupportedOperationException: The transform Window.Into() is currently not supported."

Do you support custom sources/sinks or not yet ?

Add documentation for running Dataflow on Flink on a cluster

Hi,

The README states that you can run Dataflow on Flink. Is it possible to provide some additional details on how to accomplish this?

Thanks!

Unable to serialize exception running KafkaWindowedWordCountExample

I'm trying to run KafkaWindowedWordCountExample, but I get an "unable to serialize" exception.
I tried both with Flink 0.10.2 and 0.10.1.

Basically what I did was to create a fat jar including all deps, then run flink:

$ flink run -C file:///path/to/fat.jar -c com.dataartisans.flink.dataflow.examples.streaming.KafkaWindowedWordCountExample /path/to/flink-dataflow-0.2.jar

...

org.apache.flink.client.program.ProgramInvocationException: The main method caused an error.
    at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:512)
    at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:395)
    at org.apache.flink.client.program.Client.runBlocking(Client.java:252)
    at org.apache.flink.client.CliFrontend.executeProgramBlocking(CliFrontend.java:676)
    at org.apache.flink.client.CliFrontend.run(CliFrontend.java:326)
    at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:978)
    at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1028)
Caused by: java.lang.IllegalArgumentException: unable to serialize com.dataartisans.flink.dataflow.translation.wrappers.streaming.io.UnboundedFlinkSource@6e5479cb
    at com.google.cloud.dataflow.sdk.util.SerializableUtils.serializeToByteArray(SerializableUtils.java:51)
    at com.google.cloud.dataflow.sdk.util.SerializableUtils.ensureSerializable(SerializableUtils.java:81)
    at com.google.cloud.dataflow.sdk.io.Read$Unbounded.<init>(Read.java:169)
    at com.google.cloud.dataflow.sdk.io.Read$Unbounded.<init>(Read.java:164)
    at com.google.cloud.dataflow.sdk.io.Read.from(Read.java:66)
    at com.dataartisans.flink.dataflow.examples.streaming.KafkaWindowedWordCountExample.main(KafkaWindowedWordCountExample.java:123)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:497)
    ... 6 more
Caused by: java.io.NotSerializableException: com.google.cloud.dataflow.sdk.options.ProxyInvocationHandler
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1180)
    at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
    at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
    at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
    at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
    at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
    at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
    at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:346)
    at com.google.cloud.dataflow.sdk.util.SerializableUtils.serializeToByteArray(SerializableUtils.java:47)
    ... 16 more

Any idea? Thanks, E.

FlinkRunnerResult.getState always returns null?

In https://github.com/dataArtisans/flink-dataflow/blob/master/runner/src/main/java/com/dataartisans/flink/dataflow/FlinkRunnerResult.java#L47

    @Override
    public State getState() {
        return null;
    }

How can we monitor the process of a job? Thanks.

The transform BigQueryIO.Read is currently not supported.

[main] INFO com.dataartisans.flink.dataflow.FlinkPipelineRunner - Executing pipeline using FlinkPipelineRunner.
[main] INFO com.dataartisans.flink.dataflow.FlinkPipelineRunner - Translating pipeline to Flink program.
enterCompositeTransform- a5f5b96null
|   visitTransform- 40ab1491BigQueryIO.Read
class com.google.cloud.dataflow.sdk.io.BigQueryIO$Read$Bound
java.lang.UnsupportedOperationException: The transform BigQueryIO.Read is currently not supported.
  at com.dataartisans.flink.dataflow.translation.FlinkBatchPipelineTranslator.visitTransform(FlinkBatchPipelineTranslator.java:107)
  at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:219)
  at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:215)
  at com.google.cloud.dataflow.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:102)
  at com.google.cloud.dataflow.sdk.Pipeline.traverseTopologically(Pipeline.java:259)
  at com.dataartisans.flink.dataflow.translation.FlinkPipelineTranslator.translate(FlinkPipelineTranslator.java:32)
  at com.dataartisans.flink.dataflow.FlinkPipelineExecutionEnvironment.translate(FlinkPipelineExecutionEnvironment.java:128)
  at com.dataartisans.flink.dataflow.FlinkPipelineRunner.run(FlinkPipelineRunner.java:115)
  at com.dataartisans.flink.dataflow.FlinkPipelineRunner.run(FlinkPipelineRunner.java:48)

Would be nice to support BQ IO

dataartisans / flink-dataflow Goto Github PK

flink-dataflow's People

Contributors

Stargazers

Watchers

Forkers

flink-dataflow's Issues

Recommend Projects

Recommend Topics

Recommend Org