IMDb Movie Dataset Analysis

Objective

The main objective of this project is to analyze and clean the IMDb movie dataset to gain insights into the performance and characteristics of popular movies. The analysis focuses on understanding various factors such as ratings, revenue, genres, and the performance of directors and actors.

Workflow

Data Cleaning

Load the Dataset: Read the IMDb movie dataset using pandas.
Remove Unnecessary Columns: Remove the 'Description' column as it is not required for the analysis.
Handle Missing Values:
- Revenue (Millions): Filled missing values with the 30th percentile.
- Metascore: Filled missing values with the 45th percentile.

Data Analysis

Data Visualization:
- Created a heatmap to visualize the distribution of missing values in the dataset using seaborn.
- Alternative visualization using missingno library.
Insights and Analysis:
- Analyzed the top movies based on the number of votes received.
- Calculated average metrics such as Metascore, number of votes, duration, and revenue.
- Explored the distribution of movies across different genres.
- Identified the top directors based on the revenue generated by their movies.
- Listed the most popular actors.
- Examined the total revenue generated by movies in different years.
- Compared the average Metascores (critical ratings) of different genres.

Dashboard Creation

A dashboard has been created in Power BI to visualize and analyze the IMDb movie dataset.

Key Insights

Top Movies Based on People Votes:
- The top movies based on the number of votes received are "The Dark Knight," "Inception," "The Dark Knight Rises," "Interstellar," and "The Avengers." These movies have received over 1 million votes on IMDB, indicating their widespread popularity and acclaim.
Average Metrics:
- The dashboard displays the average Metascore (58.86), average number of votes (169.81K), average duration (113.17 minutes), and average revenue (299.43 million USD) across the movies included in the analysis.
Genre Performance:
- The data shows the distribution of movies across different genres, with Action (29.3%), Drama (19.5%), and Comedy (17.5%) being the most prominent genres represented.
Director Records:
- The dashboard highlights the top directors based on the revenue generated by their movies, with J.J. Abrams, David Yates, and Christopher Nolan topping the list.
Top Actors:
- A list of the most popular actors is provided, with names like Michael, James, Jason, Jennifer, and Robert appearing frequently.
Revenue by Year:
- The data includes information on the total revenue generated by movies in different years, with peaks observed in 2008, 2012, and 2016.
Top Genres Based on Metascores:
- The dashboard compares the average Metascores (critical ratings) of different genres, providing insights into the critically acclaimed genres.

Tools Used

Python
- pandas
- numpy
- matplotlib
- seaborn
- missingno
Power BI for dashboard creation

Files

IMDB-Movie-Data.csv: Original dataset
Imdb cleaned.csv: Cleaned dataset after handling missing values

Data Cleaning Code

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import missingno as msno

# Load the dataset
df = pd.read_csv("IMDB-Movie-Data.csv")

# Remove the 'Description' column
df.drop(columns=['Description'], inplace=True)

# Handle missing values in 'Revenue (Millions)' and 'Metascore'
per = np.nanpercentile(df['Revenue (Millions)'], 30)
df['Revenue (Millions)'].fillna(per, inplace=True)

p45 = np.nanpercentile(df['Metascore'], 45)
df['Metascore'].fillna(p45, inplace=True)

# Save the cleaned dataset
df.to_csv('Imdb cleaned.csv')

# Visualization of missing values
plt.figure(figsize=(10,10))
sns.heatmap(df.isnull(), annot=True, cmap='Blues')
plt.show()

# Alternative visualization using missingno
msno.matrix(df)
plt.show()

itsmesethus / imdb-movie-power_bi-dashboard- Goto Github PK

imdb-movie-power_bi-dashboard-'s Introduction

IMDb Movie Dataset Analysis

Objective

Workflow

Data Cleaning

Data Analysis

Dashboard Creation

Key Insights

Tools Used

Files

Data Cleaning Code

imdb-movie-power_bi-dashboard-'s People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent