Code Monkey home page Code Monkey logo

ksjpswaroop / hivemq-mqtt-tensorflow-kafka-realtime-iot-machine-learning-training-inference Goto Github PK

View Code? Open in Web Editor NEW

This project forked from kaiwaehner/hivemq-mqtt-tensorflow-kafka-realtime-iot-machine-learning-training-inference

0.0 1.0 0.0 25.81 MB

Real Time Big Data / IoT Machine Learning (Model Training and Inference) with HiveMQ (MQTT), TensorFlow IO and Apache Kafka - no additional data store like S3, HDFS or Spark required

License: Apache License 2.0

Python 0.74% Jupyter Notebook 99.26%

hivemq-mqtt-tensorflow-kafka-realtime-iot-machine-learning-training-inference's Introduction

Streaming Machine Learning from IoT Devices with HiveMQ, Apache Kafka and TensorFLow

WORK IN PROGRESS... NOT FINISHED YET !!!!!!

This project implements a scenario where you can train new analytic models from streaming data - without the need for an additional data store like S3, HDFS or Spark.

We use HiveMQ as open source MQTT broker to ingest data from IoT devices, ingest the data in real time into an Apache Kafka cluster for preprocessing (using Kafka Streams / KSQL), and model training (using TensorFlow 2.0 and its Kafka IO plugin).

Use Case and Architecture

Streaming Machine Learning with MQTT, Apache Kafka and TensorFlow I/O:

  • Data Integration
  • Data Preprocessing
  • Model Training
  • Model Deployment
  • Real Time Scoring
  • Real Time Monitoring

All steps happen in real time. No additional data store like S3 or HDFS is required:

Use Case: Streaming Machine Learning with MQTT, Kafka and TensorFlow I/O

Streaming Ingestion and Model Training with Kafka and TensorFlow-IO

Typically, analytic models are trained in batch mode where you first ingest all historical data in a data store like HDFS, AWS S3 or GCS. Then you train the model using a framework like Spark MLlib, TensorFlow or Google ML.

TensorFlow I/O is a component of the TensorFlow framework which allows native integration with various technologies.

One of these integrations is tensorflow_io.kafka which allows streaming ingestion into TensorFlow from Kafka WITHOUT the need for an additional data store! This significantly simplifies the architecture and reduces development, testing and operations costs. Yong Tang, member of the SIG TensorFlow I/O team, did a great presentation about this at Kafka Summit 2019 in New York (video and slide deck available for free).

You can pick and choose the right components from the Apache Kafka and TensorFlow ecosystems to build your own machine learning infrastructure for data integration, data processing, model training and model deployment:

Machine Learning Workflow with TensorFlow and Apache Kafka Ecosystem

This demo will do the following steps:

  • Consume streaming data from MQTT Broker HiveMQ via a Kafka Consumer
  • Preprocess the data with KSQL (filter, transform)
  • Ingest the data into TensorFlow (tf.data and tensorflow-io)
  • Build, train and save the model (TensorFlow 2.0 API)
  • Deploy the model within a Kafka Streams application for embedded real time scoring

Optional steps (nice to have)

  • Show IoT-specific features with HiveMQ tool stack
  • Deploy the model via TensorFlow Serving
  • Some kind of A/B testing
  • Re-train the model and updating the Kafka Streams application (via sending the new model to a Kafka topic)
  • Monitoring of model training (via TensorBoard) and model deployment / inference (via some kind of Kafka integration + dashboard technology)
  • Confluent Cloud for Kafka as a Service (-> Focus on business problems, not running infrastructure)
  • Enhance demo with C / librdkafka clients and TensorFlow Lite for edge computing
  • todo - other ideas?

Why is this so awesome

Again, you don't need another data store anymore! Just ingest the data directly from the distributed commit log of Kafka:

Model Training from the distributed commit log of Apache Kafka leveraging TensorFlow I/O

Be aware: This is still NOT Online Training

Streaming ingestion for model training is fantastic. You don't need a data store anymore. This simplifies the architecture and reduces operations and developemt costs.

However, one common misunderstanding has to be clarified - as this question comes up every time you talk about TensorFlow I/O and Apache Kafka: As long as machine learning / deep learning frameworks and algorythms expect data in batches, you cannot achieve real online training (i.e. re-training / optimizing the model with each new input event).

Only a few algoryhtms and implementations are available today, like Online Clustering.

Thus, even with TensorFlow I/O and streaming ingestion via Apache Kafka, you still do batch training. Though, you can configure and optimize these batches to fit your use case. Additionally, only Kafka allows ingestion at large scale for use cases like connected cars or assembly lines in factories. You cannot build a scalable, reliable, mission-critical ML infrastructure just with Python.

The combination of TensorFlow I/O and Apache Kafka is a great step closer to real time training of analytic models at scale!

I posted many articles about videos about this discussion. Get started with How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka and check out my other resources if you want to learn more.

Requirements and Installation

TODO

Live Demo

Until the full demo is ready, you can already checkout two working examples which use Kafka Python clients to produce data to Kafka topics and then consume the streaming data directly with TensorFlow I/O for streaming ML without an additional data store:

TODO IMPLEMENT DEMO

  • Start HiveMQ
  • Start Confluent Platform
  • Start TensorFlow Python script consuming data to train the model
  • Create continuous data stream (MQTT messages)
  • Finish model training
  • Use model for inference: 1) via TensorFlow-IO Python API, 2) exported to a Kafka Streams / KSQL microservice and 3) via TensorFlow Serving

More Information

TensorFlow 2.0 is still beta and requires Linux

  • At the time of writing (July 2019), TensorFlow 2.0 is still beta
  • TensorFlow 2.0 I/O does NOT work on Mac or Windows (e.g. 'pip install tensorflow-io-2.0-preview does not work'). You need to install it on a Linux system (VM or Cloud Instance) to avoid many headaches and spend your time on the problem, not the infrastructure

TensorFlow 1.x to 2.0 Migration and Embedded Keras API

  • The upgrade tool allows smooth migration, especially if your TensorFlow 1.x code also uses Keras. I just executed: "tf_upgrade_v2 --infile Python-Tensorflow-1.x-Keras-Fraud-Detection-Autoencoder.ipynb --outfile Python-Tensorflow-2.0-Keras-Fraud-Detection-Autoencoder.ipynb"
  • Python-Tensorflow-1.x-Keras-Fraud-Detection-Autoencoder.ipynb is the initial Jupyter Notebook to create the autoencoder
  • Python-Tensorflow-2.0-Keras-Fraud-Detection-Autoencoder.ipynb is the TensorFlow 2.0 version
  • You usually still need to do some custom fixes. My migration report had no errors, but I still has to replace "tensorflow.keras" with "keras" imports (and uninstall the independent Keras package via pip to be on the safe path)
  • The Jupyter Notebook now runs well with TensorFlow 2.0 and its embedded Keras API

hivemq-mqtt-tensorflow-kafka-realtime-iot-machine-learning-training-inference's People

Contributors

kaiwaehner avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.