Code Monkey home page Code Monkey logo

imdb-movie-power_bi-dashboard-'s Introduction

IMDb Movie Dataset Analysis

Objective

The main objective of this project is to analyze and clean the IMDb movie dataset to gain insights into the performance and characteristics of popular movies. The analysis focuses on understanding various factors such as ratings, revenue, genres, and the performance of directors and actors.

Workflow

Data Cleaning

  1. Load the Dataset: Read the IMDb movie dataset using pandas.
  2. Remove Unnecessary Columns: Remove the 'Description' column as it is not required for the analysis.
  3. Handle Missing Values:
    • Revenue (Millions): Filled missing values with the 30th percentile.
    • Metascore: Filled missing values with the 45th percentile.

Data Analysis

  1. Data Visualization:

    • Created a heatmap to visualize the distribution of missing values in the dataset using seaborn.
    • Alternative visualization using missingno library.
  2. Insights and Analysis:

    • Analyzed the top movies based on the number of votes received.
    • Calculated average metrics such as Metascore, number of votes, duration, and revenue.
    • Explored the distribution of movies across different genres.
    • Identified the top directors based on the revenue generated by their movies.
    • Listed the most popular actors.
    • Examined the total revenue generated by movies in different years.
    • Compared the average Metascores (critical ratings) of different genres.

Dashboard Creation

A dashboard has been created in Power BI to visualize and analyze the IMDb movie dataset.

Key Insights

  1. Top Movies Based on People Votes:

    • The top movies based on the number of votes received are "The Dark Knight," "Inception," "The Dark Knight Rises," "Interstellar," and "The Avengers." These movies have received over 1 million votes on IMDB, indicating their widespread popularity and acclaim.
  2. Average Metrics:

    • The dashboard displays the average Metascore (58.86), average number of votes (169.81K), average duration (113.17 minutes), and average revenue (299.43 million USD) across the movies included in the analysis.
  3. Genre Performance:

    • The data shows the distribution of movies across different genres, with Action (29.3%), Drama (19.5%), and Comedy (17.5%) being the most prominent genres represented.
  4. Director Records:

    • The dashboard highlights the top directors based on the revenue generated by their movies, with J.J. Abrams, David Yates, and Christopher Nolan topping the list.
  5. Top Actors:

    • A list of the most popular actors is provided, with names like Michael, James, Jason, Jennifer, and Robert appearing frequently.
  6. Revenue by Year:

    • The data includes information on the total revenue generated by movies in different years, with peaks observed in 2008, 2012, and 2016.
  7. Top Genres Based on Metascores:

    • The dashboard compares the average Metascores (critical ratings) of different genres, providing insights into the critically acclaimed genres.

Tools Used

  • Python

    • pandas
    • numpy
    • matplotlib
    • seaborn
    • missingno
  • Power BI for dashboard creation

Files

  • IMDB-Movie-Data.csv: Original dataset
  • Imdb cleaned.csv: Cleaned dataset after handling missing values

Data Cleaning Code

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import missingno as msno

# Load the dataset
df = pd.read_csv("IMDB-Movie-Data.csv")

# Remove the 'Description' column
df.drop(columns=['Description'], inplace=True)

# Handle missing values in 'Revenue (Millions)' and 'Metascore'
per = np.nanpercentile(df['Revenue (Millions)'], 30)
df['Revenue (Millions)'].fillna(per, inplace=True)

p45 = np.nanpercentile(df['Metascore'], 45)
df['Metascore'].fillna(p45, inplace=True)

# Save the cleaned dataset
df.to_csv('Imdb cleaned.csv')

# Visualization of missing values
plt.figure(figsize=(10,10))
sns.heatmap(df.isnull(), annot=True, cmap='Blues')
plt.show()

# Alternative visualization using missingno
msno.matrix(df)
plt.show()

imdb-movie-power_bi-dashboard-'s People

Contributors

itsmesethus avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.