Code Monkey home page Code Monkey logo

youtube-warehousing's Introduction

YouTube Data Harvesting and Warehousing by Hemanth Karthick

Welcome to the YouTube Data Harvesting and Warehousing project! This application enables users to access, analyze, and store data from various YouTube channels, leveraging powerful technologies like PostgreSQL, MongoDB, and Streamlit.

๐Ÿš€ Project Overview

The application provides a seamless interface for retrieving, saving, and querying YouTube channel and video data. Built with Python and a suite of robust tools, it supports the following features:

  • Data Retrieval: Fetch channel and video information using the YouTube Data API v3.
  • Data Storage: Store raw data in MongoDB, acting as a data lake.
  • Data Migration: Migrate data from MongoDB to PostgreSQL for structured querying and analysis.
  • Data Querying: Search and retrieve data from the SQL database with various options.

๐Ÿ”ง Tools and Libraries

  • Streamlit: Build interactive and user-friendly UIs for data interaction.
  • Python: Core programming language for data retrieval, processing, and visualization.
  • Google API Client: Interface with YouTube's Data API v3 to fetch data.
  • MongoDB: Document database for storing unstructured or semi-structured data.
  • PostgreSQL: Relational database for managing structured data with advanced SQL features.

๐Ÿ“ฆ Required Libraries

Ensure you have the following Python libraries installed:

pip install google-api-python-client
pip install streamlit
pip install psycopg2
pip install pymongo
pip install pandas

๐Ÿ” Features

Data Retrieval

Fetch channel and video data from YouTube using the API:

from googleapiclient.discovery import build

# Set up YouTube API client
youtube = build('youtube', 'v3', developerKey='YOUR_API_KEY')

# Example function to get channel details
def get_channel_details(channel_id):
    request = youtube.channels().list(
        part='snippet,contentDetails,statistics',
        id=channel_id
    )
    response = request.execute()
    return response

Data Storage

Store data in MongoDB:

from pymongo import MongoClient

# Connect to MongoDB
client = MongoClient('mongodb://localhost:27017/')
db = client['youtube_data']

# Example function to insert data
def store_channel_data(channel_data):
    db.channels.insert_one(channel_data)

Data Migration

Move data from MongoDB to PostgreSQL:

import psycopg2

# Connect to PostgreSQL
conn = psycopg2.connect(dbname='youtube_db', user='user', password='password', host='localhost')
cur = conn.cursor()

# Example function to insert data into PostgreSQL
def migrate_channel_data():
    data = db.channels.find()
    for item in data:
        cur.execute("INSERT INTO channels (id, name, description) VALUES (%s, %s, %s)",
                    (item['id'], item['name'], item['description']))
    conn.commit()

Data Querying

Query and analyze data with SQL:

# Example function to query channel data
def query_channel_data(name):
    cur.execute("SELECT * FROM channels WHERE name = %s", (name,))
    results = cur.fetchall()
    return results

โš ๏ธ Ethical Considerations

Respect YouTubeโ€™s terms of service and data protection regulations. Ensure responsible data handling and prevent misuse. Prioritize privacy and integrity in your data scraping processes.

๐ŸŒ Getting Started

  1. Clone the Repository:

    git clone https://github.com/yourusername/youtube-data-warehousing.git
  2. Install Dependencies:

    pip install -r requirements.txt
  3. Run the Application:

    streamlit run app.py
  4. Configure Your API Keys:

    Update config.py with your YouTube API key and database connection details.

๐Ÿ› ๏ธ Contributing

Contributions are welcome! Please open an issue or submit a pull request on GitHub.

๐Ÿ“„ License

This project is licensed under the MIT License. See the LICENSE file for details.

youtube-warehousing's People

Contributors

barkhavarshney avatar phoenix-mp3 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.