This is an example project meant to demonstrate as to how to implement custom spark sql structured streaming source.
This is based of the implementation code from
spark v2.3.
The existing dstream based streaming source documentation exists here Using the custom receiver in a Spark Streaming application. However this implementation is not same for structured sql streaming. As of this writing; i was not able to find a clear cut documentation on how to implement a source for structured streaming. The only clue/lead I had to start with was from the stackoverflow :
How to create a custom streaming data source?
This example project is my take on deciphering the working of the source TextSocketSource using which I have implemented the following streaming sources
- RandomAlphaStreamSource : This generates a random set of alpha numeric characters and send the same via the stream. This is a stripped down to the core implementation similar to "rate source".
- MeetupRSVPStreamSource : This is a simple implementation of the real time stream from Meetup RSVP .
Note:
- These examples are not coded to be robust for production use-cases. They also do not cover all aspects of structured streaming (ex: recovery ,rollback). These, if needed, should be done by the implementor.
- This repo will not necessarily be maintained for future releases/versions of spark. Feel free to take it and explore/enhance at your own will and needs.
The theory and implementation logic is explained in the page wiki/Home.md.
Build
mvn compile package
Should produce the packaged shaded jar at: target/scala-2.11/jars/SparkSqlStream-1.0-SNAPSHOT.jar
Run
I have created an invocation script bin/SubmitStreamApp.sh.
bin/SubmitStreamApp.sh
You might need to modify all the environment related variables such as SPARK_HOME as per your environment.