Code Monkey home page Code Monkey logo

python-notebooks-for-apache-kafka's Introduction

Python Jupyter Notebooks for Apache Kafka®

This is a series of Jupyter Notebooks on how to start with Apache Kafka® and Python. You can try these notebooks in order to learn the basic concepts of Apache Kafka in an environment containing markdown text, media and executable code on the same page.

The notebooks are based on a managed Apache Kafka instance created on Aiven's website, but can be also customised to any Apache Kafka instance running locally with SSL authentication. Aiven's offer 300$ of free credit that you can redeem by creating your account on Aiven's website.

If you have any question or improvement suggestion regarding the notebooks, please open an issue. Any contributions are welcome!

Start JupyterLab on Docker

You can access the notebooks via Jupyterlab, this example will be based on docker

  1. clone the repository
  2. open a terminal
  3. go to the folder where the repository has been cloned
  4. run the following
docker run --rm -p 8888:8888 \
  -e JUPYTER_ENABLE_LAB=yes  \
  -v "$PWD":/home/jovyan/work \
  jupyter/datascience-notebook

You'll see a folder named work on the top left, under it you'll find the list of notebooks.

Notebook Overview

This repository contains the following notebooks.

Notebook Details

The notebooks are divided per Apache Kafka functionality.

Create Managed Apache Kafka and PostgreSQL instances with Aiven.io

Create services

00 - Aiven Setup.ipynb notebook downloads Aiven's command line interface and creates an Apache Kafka and a PostgreSQL instance.

Please change <INSERT_TOKEN_HERE> and <INSERT_EMAIL_HERE> with a valid email address and token created on Aiven's website. The notebook creates the instances and also stores all the required connection credentials locally.

Produce and read Messages to Apache Kafka

Producer

01 - Producer.ipynb Creates a Python Apache Kafka Producer and produces the first messages. After the first message is produced, open the 02 - Consumer.ipynb notebook and pace it alongside the Producer.

Place consumer alongside the producer

02 - Consumer.ipynb reads from the topic where 01 - Producer wrote. But it does it from the point in time that it attaches to Apache Kafka, not going back to history.

Consumer

If you want to read messages created with 01 - Producer you need to run 02 - Consumer.ipynb's last code block before producing any messages on 01 - Producer. This behaviour is Apache Kafka's default and can be changed by adding a line 'auto.offset.reset'='earliest' to the consumer properties.

Understanding Apache Kafka Partitions

Partitions

Partitions is Apache Kafka are a way to divide messages belonging to the same topic in sub-logs.

  • 03 - 00 - Partition Producer.ipynb creates a topic with two partitions using KafkaAdmin and sends a message to each partition. We can then open both 03 - 01 - Consumer - Partition 0.ipynb and 03 - 02 - Consumer - Partition 1.ipynb which will read messages from Partition 0 and Partition 1 respectively.

New Consumer Group

Consumer groups

Messages in Apache Kafka are not deleted when read from a consumer. This makes them available for other consumers to be read. 04 - New Consumer Group.ipynb creates a new consumer part of the a new Consumer Group and reads from the topic where 01 - Producer wrote. We can check now, by sending a message from the 01 - Producer notebook, that we can receive it both in 02 - Consumer.ipynb and 04 - New Consumer Group.

Kafka Connect

Kafka Connect

Apache Kafka Connect® is a prebuilt framework enabling an easy integration of Apache Kafka with existing data sources or sinks. Aiven provides Kafka connect as managed service making the integration a matter of a single config file. 05 - Kafka Connect.ipynb: Creates a new Kafka topic containing messages with both schema and payload, and then pushes them to a PostgreSQL database via Apache Kafka Connect.

Delete Aiven Services

Delete services

Once you're done, you can delete all the services create on Aiven's website by executing the code in ON - Aiven - Delete Services.ipynb

Keep Reading

We maintain some other resources that you may also find useful:

License

This project is licensed under the Apache License, Version 2.0.

Apache Kafka is either a registered trademark or trademark of the Apache Software Foundation in the United States and/or other countries. Aiven has no affiliation with and is not endorsed by The Apache Software Foundation.

python-notebooks-for-apache-kafka's People

Contributors

ftisiot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-notebooks-for-apache-kafka's Issues

README needs a project introduction

If a developer found this repo via search, not via the blog post, what would they need to know? Let's do an overview "this is a demo, you can try these notebooks, Aiven has a free trial or link to kafka docs for running locally" ... that sort of thing. I think also a note about opening an issue with any suggested additions or questions might be nice.

Notebooks index

Let's list the notebooks in the README, with an explanation of what each one does and (if appropriate) how to configure/use it. Then if we add more, we can expand this too and make things easier for people to find.

unable to create project/service on aiven

What can we help you with?

In workbook 00, I can authenticate, but the services aren't created and error is thrown...


---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
Input In [7], in <cell line: 1>()
----> 1 get_ipython().run_cell_magic('bash', '', '\nsource config/profile_info.sh\n\navn service create  -p $AIVEN_PLAN_NAME \\\n                    -t kafka $KAFKA_NAME \\\n                    --cloud $CLOUD \\\n                    --project $PROJECT_NAME \\\n                    -c kafka_rest=true \\\n                    -c kafka.auto_create_topics_enable=true \\\n                    -c schema_registry=true \\\n                    -c kafka_connect=true\n\navn service create $POSTGRES_NAME -t pg -p startup-4 --cloud $CLOUD --project $PROJECT_NAME\n\navn service wait $KAFKA_NAME --project $PROJECT_NAME\n')

File /opt/conda/lib/python3.10/site-packages/IPython/core/interactiveshell.py:2358, in InteractiveShell.run_cell_magic(self, magic_name, line, cell)
   2356 with self.builtin_trap:
   2357     args = (magic_arg_s, cell)
-> 2358     result = fn(*args, **kwargs)
   2359 return result

File /opt/conda/lib/python3.10/site-packages/IPython/core/magics/script.py:153, in ScriptMagics._make_script_magic.<locals>.named_script_magic(line, cell)
    151 else:
    152     line = script
--> 153 return self.shebang(line, cell)

File /opt/conda/lib/python3.10/site-packages/IPython/core/magics/script.py:305, in ScriptMagics.shebang(self, line, cell)
    300 if args.raise_error and p.returncode != 0:
    301     # If we get here and p.returncode is still None, we must have
    302     # killed it but not yet seen its return code. We don't wait for it,
    303     # in case it's stuck in uninterruptible sleep. -9 = SIGKILL
    304     rc = p.returncode or -9
--> 305     raise CalledProcessError(rc, cell)

CalledProcessError: Command 'b'\nsource config/profile_info.sh\n\navn service create  -p $AIVEN_PLAN_NAME \\\n                    -t kafka $KAFKA_NAME \\\n                    --cloud $CLOUD \\\n                    --project $PROJECT_NAME \\\n                    -c kafka_rest=true \\\n                    -c kafka.auto_create_topics_enable=true \\\n                    -c schema_registry=true \\\n                    -c kafka_connect=true\n\navn service create $POSTGRES_NAME -t pg -p startup-4 --cloud $CLOUD --project $PROJECT_NAME\n\navn service wait $KAFKA_NAME --project $PROJECT_NAME\n'' returned non-zero exit status 1.



Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.