Code Monkey home page Code Monkey logo

krishnasairaj / aws-driven-sales-performance-outlook Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 18 KB

The Project aims to establish a robust data pipeline for tracking and analyzing sales performance using various AWS services. The process involves creating a DynamoDB database, implementing Change Data Capture (CDC), utilizing Kinesis streams, and finally, storing and querying the data in Amazon Athena.

Python 100.00%
aws-athena aws-glue-crawler aws-lambda dynamodb eventbridge-pipes glue-catalog kinesis kinesis-firehose python s3-bucket

aws-driven-sales-performance-outlook's Introduction

AWS-Driven-Sales-Performance-Outlook

The Project aims to establish a robust data pipeline for tracking and analyzing sales performance using various AWS services. The process involves creating a DynamoDB database, implementing Change Data Capture (CDC), utilizing Kinesis streams, and finally, storing and querying the data in Amazon Athena.

Architecture

Technology Used

  • Python
  • DynamoDB
  • DynamoDB Stream(CDC)
  • Kinesis Stream
  • Kinesis Filehose
  • Event Bridge Pipe(For Stream Ingestion)
  • Kinesis Firehose(To Batch Streaming)
  • Lambda
  • Athena
  • S3

Features

  • Data Generation Script

    • A Python script has been provided to generate synthetic sales data.
    • The script uses the boto3 library to connect with DynamoDB.
    • The file is included in the repository(mock_data_generator_for_dynamodb.py)
  • DynamoDB Setup

    • A DynamoDB database named sales-performance-outlook is created.

    • Implemented Change Data Capture (CDC) for tracking updates and deletions in records. image image

    • Established a DynamoDB table named Orders-data-table with order_id as the key.

    • Enabled DynamoDB stream to capture changes in the table(sales-performance-outlook), specifying what data to capture (old/new item). image

  • Kinesis Stream

    • Created a Kinesis stream named kinesis-sales-order to collect streaming data.
    • Similar to Kafka, Kinesis uses shards for partitioning data streams.
    • Read from Starting position of source data image image
  • Event Bridge Integration

    • Configured an Event Bridge pipe to integrate DynamoDB and Kinesis stream.
    • Specified DynamoDB as the source and Kinesis stream as the destination, sharding data based on the eventID. image
  • Kinesis Firehose

    • Implemented Kinesis Firehose for processing streaming data as batch data.
    • Kinesis Firehose buffer interval 15sec image
    • Used a Lambda function for data transformation, decoding and parsing DynamoDB strings into JSON with newline characters.
    • Configured buffer time for efficient data processing. image
  • Glue

    • Created Catalog named aws-driven-sales-performance-outlook-catlog which uses crawler to get metadata of data in various data sources
    • Which creates table based on the schema of the data and further it can be used by Athena for analytical purpose image image
  • S3 Storage and Crawler

    • Set up S3 as the destination for Kinesis Firehose, storing transformed data in files.

    • The Sample File is stored in the repostory directory(Output_Sample)

    • S3 Collect buffer interval 60sec image image

    • Created a crawler with a JSON classifier to identify raw data patterns in the S3 bucket.

    • Ran the crawler with an output file prefix of outlook_ to create a table in the sales-data-catalog database.

    • "$.order_id,$.product_name,$.quantity,$.price" (classifier pattern)json

    • So the classifier avoids the crawler to scan the raw data violating pattern image

  • Athena Query

    • Utilized AWS Athena, a serverless analytical service, to query the data stored in the sales-data-catalog table.
    • Athena enables seamless querying of data without the need for a dedicated infrastructure. image
  • Permissions Management

    • Added necessary permissions to IAM users for DynamoDB, Kinesis, and Event Bridge.

Key Outcomes

  • The project follows a comprehensive data pipeline architecture to capture, process, and analyze sales data efficiently.
  • The inclusion of CDC ensures that changes in records are tracked, providing a complete view of sales performance over time.
  • The use of serverless services like Athena and Lambda minimizes infrastructure management efforts.
  • The project showcases the integration of multiple AWS services for a seamless end-to-end data processing and analytics solution.

aws-driven-sales-performance-outlook's People

Contributors

krishnasairaj avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.