Code Monkey home page Code Monkey logo

data-modeling-with-postgres's Introduction

Data Modeling with Postgres

Scope

Data Modeling with Postgres is a project designed for Sparkify with the goal to streamline their data models and simplify the querying into the database. The project aims to create an ETL pipeline using the JSON logs to save into the Postgres database inorder to make it east for querying and performing analysis on song plays.

Prerequsites

  • Postgres
  • Python 3+
  • Anaconda / Jupyter Notebook

How to Execute

  1. Run create_tables.py
  2. Run etl.py
  3. Test and Validate the script using test.ipynb Jupyter Notebook

Database Design & Schema

In order to simplify and increase query execution we'd have to develope a database on a star schema. We'd have a Fact table songplays and dimension
tables users, artists, songs, time.

Songplays

Column DataType Constraint
songplay_id Serial NOT NULL , PRIMARY
start_time VARCHAR NOT NULL
user_id INT NOT NULL
level VARCHAR
song_id VARCHAR
artist_id VARCHAR
session_id INT
location VARCHAR
user_agent VARCHAR

Dimesion Tables:

users

Column DataType Constraint
user_id INT NOT NULL, PRIMARY
first_name VARCHAR
last_name VARCHAR
gender VARCHAR
duration VARCHAR

Songs

Column DataType Constraint
song_id VARCHAR NOT NULL, PRIMARY
title VARCHAR
artist_id VARCHAR NOT NULL
year INT
duration FLOAT

Artists

Column DataType Constraint
artist_id VARCHAR NOT NULL, PRIMARY
name VARCHAR
location VARCHAR
latitude FLOAT
longitude FLOAT

Time

Column DataType Constraint
start_time TIMESTAMP NOT NULL, PRIMARY
hour INT
day INT
week INT
month INT
year INT
weekday INT

As we can see from the tables, we can get most of the data we need for analysis using the songplays table and if we need further data we can retrieve it using minimum number of joins.

data-modeling-with-postgres's People

Contributors

sazack avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.