Code Monkey home page Code Monkey logo

dataengineering-youtube-analysis-project's Introduction

Data Engineering YouTube Analysis Project by Darshil Parmar

Overview

This project aims to securely manage, streamline, and perform analysis on the structured and semi-structured YouTube videos data based on the video categories and the trending metrics.

Project Goals

  1. Data Ingestion — Build a mechanism to ingest data from different sources
  2. ETL System — We are getting data in raw format, transforming this data into the proper format
  3. Data lake — We will be getting data from multiple sources so we need centralized repo to store them
  4. Scalability — As the size of our data increases, we need to make sure our system scales with it
  5. Cloud — We can’t process vast amounts of data on our local computer so we need to use the cloud, in this case, we will use AWS
  6. Reporting — Build a dashboard to get answers to the question we asked earlier

Services we will be using

  1. Amazon S3: Amazon S3 is an object storage service that provides manufacturing scalability, data availability, security, and performance.
  2. AWS IAM: This is nothing but identity and access management which enables us to manage access to AWS services and resources securely.
  3. QuickSight: Amazon QuickSight is a scalable, serverless, embeddable, machine learning-powered business intelligence (BI) service built for the cloud.
  4. AWS Glue: A serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development.
  5. AWS Lambda: Lambda is a computing service that allows programmers to run code without creating or managing servers.
  6. AWS Athena: Athena is an interactive query service for S3 in which there is no need to load data it stays in S3.

Dataset Used

This Kaggle dataset contains statistics (CSV files) on daily popular YouTube videos over the course of many months. There are up to 200 trending videos published every day for many locations. The data for each region is in its own file. The video title, channel title, publication time, tags, views, likes and dislikes, description, and comment count are among the items included in the data. A category_id field, which differs by area, is also included in the JSON file linked to the region.

https://www.kaggle.com/datasets/datasnaek/youtube-new

Architecture Diagram

Complete Tutorial

I have created a detailed 3+ hour tutorial on this project, where you will execute everything from start to end

https://youtu.be/yZKJFKu49Dk

dataengineering-youtube-analysis-project's People

Contributors

azriel-stephen avatar darshilparmar avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.