Code Monkey home page Code Monkey logo

auth-ece-thesis's Introduction

Description

Abstract

Design and implementation of a big data architecture for storage, real-time processing and batch analysis of smart metering data


As the world continues to shift towards a more sustainable future, the need for efficient management of natural resources is becoming increasingly critical. This has led to the emergence of smart meters as a key technology in the utilities sector. Smart meters are intelligent devices that can collect and analyze electricity, water and gas consumption data in real-time, allowing for better monitoring and management of resource usage. With this data, utilities can make informed decisions about how to optimize their distribution networks and reduce wastage, while also providing customers with greater insight into their usage patterns and helping them to make more informed choices about their consumption.

In order to meet these needs, utilities must employ a combination of scalable and fault-tolerant storage systems, to store large amounts of data in an efficient and cost-effective manner. Additionally, stream processing technologies can be used to ingest and process real-time data as it is generated by the smart meters, enabling utilities and customers to quickly detect and respond to anomalies, ultimately improving the efficiency and reliability of the service.

This dissertation focuses on the design and implementation of a scalable and extensible Lambda architecture for storing and analyzing the huge volume of data generated by smart meter devices, although it can generally be applied to any system that stores and processes time series data. In the proposed architecture, Apache Kafka is used as the messaging backbone, through which device metrics are made available to consumer applications. For real time processing, Kafka Streams and ksqlDB are used to calculate aggregates, filter and re-route messages, as well as alarm when outliers are detected. Cassandra is selected as the database for storing data for the medium-term, while S3 as the data lake for the long-term. The implementation focuses on the main data ingestion, stream processing and medium-term storage, with example flows being presented. Finally, performance tests are conducted to collect metrics to assess the scalability of key components of the architecture.

Architecture

general-architecture

Modules

  • common: Classes re-used from all other modules, like converters, serdes and DTOs
  • consumer: Implements various consuming applications, like:
    • Kafka consumer that prints metrics received to the console
    • Kafka Streams consumer that aggregates metrics in a time window and publishes the results back to Kafka
    • Kafka consumer that writes aggregated metrics into a cassandra database
    • Kafka consumer that tracks the time until a total message count has been received, used for performance testing of the aggregator
  • persistence: Implements DAO for interfacing with Cassandra for persisting metric aggregates
  • replay: Publisher application to replay data from specific datasets into Kafka

Prerequisites

Publisher

For the publisher to be able to replay the data, place:

Redis Kafka Connector

Before starting docker compose place under connect/redis-connector the Redis Sink Connector after downloading as zip.

Cassandra

After starting docker compose, make sure to run scripts/create-tables.cql to have the base table topology setup before running the cassandra sink(s).

Build

Image of publisher application:

./gradlew  :replay:jibDockerBuild

(Common) Image of consumer applications:

./gradlew  :consumer:jibDockerBuild

Run

Follow the commands and comments of scripts/demo.sh

To run a sample flow for kqlDB, follow the commands and comments of ksqldb/queries.ksql

auth-ece-thesis's People

Contributors

alexsah-ece avatar

Stargazers

Konstantinos Panayiotou avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.