Code Monkey home page Code Monkey logo

hdinsight-spark-scala-kafka's Introduction

Use Apache Kafka with Apache Spark on hdinsight

This is a basic example of streaming data to and from Kafka on HDInsight from a Spark on HDInsight cluster. This example uses Kafka DStreams. This example expects Kafka and Spark on HDInsight 3.6.

How to create Azure subscription using Azure Pass

Azure Pass homepage

Redemption Process Guide

How to create Appache Kafka and Spark cluster

NOTE: Apache Kafka and Spark are available as two different cluster types. HDInsight cluster types are tuned for the performance of a specific technology; in this case, Kafka and Spark. To use both together, you must create an Azure Virtual network and then create both a Kafka and Spark cluster on the virtual network. For an example of how to do this using an Azure Resource Manager template, see https://hditutorialdata.blob.core.windows.net/armtemplates/create-linux-based-kafka-spark-cluster-in-vnet.json. For an example of using the template with this example, see Use Apache Spark with Kafka on HDInsight (preview).

Direct link to cretate full environment Deploy to Azure

Understand this example

This example uses a Scala application in a Jupyter notebook. The code in the notebook relies on the following pieces of data:

  • Kafka brokers: The broker process runs on each workernode on the Kafka cluster. The list of brokers is required by the producer component, which writes data to Kafka.

  • A Twitter app configuration: The Stream-Tweets-To-Kafka.ipynb notebook uses Twitter to populate data in Kafka. If you do not have a Twitter app set up, visit to create one.

  • Topic name: The name of the topic that data is written to and read from. This example expects a topic named tweets.

To run this example

To use the example Jupyter notebooks, you must upload them to the Jupyter Notebook server on the Spark cluster. Use the following steps to upload the notebook:

  1. In your web browser, use the following URL to connect to the Jupyter Notebook server on the Spark cluster. Replace CLUSTERNAME with the name of your Spark cluster.

     https://CLUSTERNAME.azurehdinsight.net/jupyter
    

    When prompted, enter the cluster login (admin) and password used when you created the cluster.

  2. From the upper right side of the page, use the Upload button to upload the Stream-Tweets-To-Kafka.ipynb file. Select the file in the file browser dialog and select Open.

  3. Find the Stream-Tweets-To-Kafka.ipynb entry in the list of notebooks, and select Upload button beside it.

  4. Once the file has uploaded, select the KafkaStreaming.ipynb entry to open the notebook. To load tweets into Kafka, follow the instructions in the notebook.

  5. Repeat steps 1-3 to upload the Spark-Streaming-From-Kafka-With-DStreams.ipynb document to Kafka. Once the file has uploaded, select the entry to open the notebook. Follow the instructions in the notebook to read the tweets from Kafka.

hdinsight-spark-scala-kafka's People

Contributors

blackmist avatar pospanet avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.