Code Monkey home page Code Monkey logo

cookie_logs's Introduction

Web Data Ingestion Pipeline with Apache NiFi, Kafka, and MySQL

This project implements a scalable web data ingestion pipeline using Apache NiFi, Kafka, and MySQL. It captures website session cookies and logs, stores the data in a MySQL database, and intelligently routes specific data subsets to designated destinations.

Installation

Before getting started, ensure that you have the following requirements installed on your system:

Usage

Using Bash Commands:

  1. Set Up Kafka and Zookeeper:

    • Start Zookeeper and Kafka Broker.
  2. Kafka Topic and Producer:

    • Create the "cookie_logs" Kafka topic.
    • Optionally, start a Kafka Producer server for manual data production.
  3. Optional: Create Kafka Consumer:

    • Optionally, create a Kafka Consumer for testing and monitoring data.
  4. Start Apache NiFi:

    • Launch the Apache NiFi server for data consumption and processing.

Using Python kafka_prod.py:

  1. Generate Website Logs:

    • Use kafka_prod.py to create synthetic website logs with Faker.
  2. Initialize Kafka Producer:

    • Configure a Kafka Producer to send data as JSON to Kafka.
  3. Send Data to Kafka:

    • Execute the script to send 1000 generated log entries to the "cookie-logs" Kafka topic.

Using mysql_commands:

  1. Start MySQL Server:

    • Start MySQL server based on your system's requirements.
  2. Connect to MySQL:

    • Use the mysql command to connect to the MySQL server with credentials.
  3. Create Database and Table:

    • Execute SQL commands to create the cookie_logs database and define the logs table.

Import NiFi Pipeline XML:

  • Import the NiFi pipeline XML file cookie_logs.xml to set up the dataflow for processing and storing the ingested data.

These steps will help you set up and run your web data ingestion pipeline efficiently. Ensure that you have the necessary dependencies installed before proceeding.

cookie_logs's People

Contributors

mogomaa79 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.