Code Monkey home page Code Monkey logo

pinot-quickstart's Introduction

Pinot Getting Started Guide

Welcome to the Apache Pinot Getting Started guide. This repository will help you set up and run a demonstration that involves streaming and batch data sources. The demonstration includes a real-time stream of movie ratings and a batch data source of movies, which can be joined in Apache Pinot for querying.

Architecture Diagram

flowchart LR

Stream-->k[Apache Kafka]-->p[Apache Pinot]

Batch-->p

p-->mrp[Movie Ratings]

p-->Movies

A Quick Shortcut

To quickly see the demonstration in action, you can use the following command:

make

For a detailed step-by-step setup, please refer to the Step-by-Step Details section.

If you're ready to explore the advanced features, jump directly to the Apache Pinot Advanced Usage section to run a multi-stage join between the ratings and movies tables.

Step-by-Step Details

This section provides detailed instructions to get the demonstration up and running from scratch.

Step 1: Build and Launch with Docker

Apache Pinot queries real-time data through streaming platforms like Apache Kafka. This setup includes a mock stream producer using Python to write data into Kafka.

First, build the producer image and start all services using the following commands:

docker compose build --no-cache

docker compose up -d

The docker-compose.yml file configures the following services:

  • Zookeeper (dedicated to Pinot)
  • Pinot Controller, Broker, and Server
  • Kraft (Zookeeperless Kafka)
  • Python producer

Step 2: Create a Kafka Topic

Next, create a Kafka topic for the producer to send data to, which Pinot will then read from:

docker exec -it kafka kafka-topics.sh \
    --bootstrap-server localhost:9092 \
    --create \
    --topic movie_ratings

To verify the stream, check the data flowing into the Kafka topic:

docker exec -it kafka \
    kafka-console-consumer.sh \
    --bootstrap-server localhost:9092 \
    --topic movie_ratings

Step 3: Configure Pinot Tables

In Pinot, create two types of tables:

  1. A REALTIME table for streaming data (movie_ratings).
  2. An OFFLINE table for batch data (movies).

To query the Kafka topic in Pinot, we add the real-time table using the pinot-admin CLI, providing it with a schema and a table configuration. The table configuration contains the connection information to Kafka.

docker exec -it pinot-controller ./bin/pinot-admin.sh \
    AddTable \
    -tableConfigFile /tmp/pinot/table/ratings.table.json \
    -schemaFile /tmp/pinot/table/ratings.schema.json \
    -exec

At this point, you should be able to query the topic in the Pinot console.

We now do the same for the OFFLINE table using this schema and table configuration.

docker exec -it pinot-controller ./bin/pinot-admin.sh \
    AddTable \
    -tableConfigFile /tmp/pinot/table/movies.table.json \
    -schemaFile /tmp/pinot/table/movies.schema.json \
    -exec

Once added, the OFFLINE table will not have any data. Let's add data in the next step.

Step 4: Load Data into the Movies Table

Use the following command to load data into the OFFLINE movies table:

docker exec -it pinot-controller ./bin/pinot-admin.sh \
    LaunchDataIngestionJob \
    -jobSpecFile /tmp/pinot/table/jobspec.yaml

Now, both the REALTIME and OFFLINE tables are queryable.

Step 5: Apache Pinot Advanced Usage

To perform complex queries such as joins, open the Pinot console here and enable Use Multi-Stage Engine. Example query:

select
    r.rating latest_rating,
    m.rating initial_rating,
    m.title,
    m.genres,
    m.releaseYear
from movies m
         left join movie_ratings r on m.movieId = r.movieId
where r.rating > .9
order by r.rating desc
    limit 10

alt

Clean Up

To stop and remove all services related to the demonstration, run:

docker compose down

Troubleshooting

If you encounter "No space left on device" during the Docker build process, you can free up space with:

docker system prune -f

Further Reading

For more detailed tutorials and documentation, visit the StarTree developer page here

pinot-quickstart's People

Contributors

hdulay avatar gamussa avatar

Stargazers

 avatar Sam Kasimalla avatar

Watchers

 avatar Kishore Gopalakrishna avatar Xiang Fu avatar Mayank Shrivastava avatar Neha Pawar avatar Seunghyun Lee avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.