Heroes Of Pymoli Data Analysis
import pandas as pd
path = 'Resources/purchase_data.json'
heroes = pd.read_json(path)
Player Count
# Total number of unique players (based on SN)
total_players = len(heroes['SN'].unique())
# Create DF
total_players_df = pd.DataFrame({'Total number of players': [total_players]})
total_players_df
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Total number of players | |
---|---|
0 | 573 |
Purchasing Analysis (Total)
# Number of unique items bought, different from item name
number_unique_items = len(heroes['Item ID'].unique())
# Average price
av_purch_price = round(heroes['Price'].mean(), 2)
# Number of purchases is equal to rows number (I hope.. )
number_purchases = heroes['Item ID'].count()
# Total revenue - sum of all prices
total_revenue = heroes['Price'].sum()
# Create DF
purch_analysis_df = pd.DataFrame({'Number of Unique Items': [number_unique_items], 'Average Purchase Price': [av_purch_price],
'Total Number of Purchases': [number_purchases], 'Total Revenue': [total_revenue]})
purch_analysis_df['Average Purchase Price'] = purch_analysis_df['Average Purchase Price'].map("$ {:,.2f}".format)
purch_analysis_df['Total Revenue'] = purch_analysis_df['Total Revenue'].map("$ {:,.2f}".format)
purch_analysis = purch_analysis_df[['Number of Unique Items', 'Average Purchase Price', 'Total Number of Purchases',
'Total Revenue']]
purch_analysis
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Number of Unique Items | Average Purchase Price | Total Number of Purchases | Total Revenue | |
---|---|---|---|---|
0 | 183 | $ 2.93 | 780 | $ 2,286.33 |
Gender Demographics
# Check what we have in column 'Gender'
heroes['Gender'].value_counts()
Male 633
Female 136
Other / Non-Disclosed 11
Name: Gender, dtype: int64
# Create a DF with Gender & SN only
gender_df = heroes[['SN', 'Gender']]
# Delete all duplicates in SN, exept the last one (keep='last')
clean_gender = gender_df.drop_duplicates(['SN'], keep='last')
# Change index to Gender, to use loc. function
clean_gender_ind = clean_gender.set_index('Gender')
# Use loc. to count all 3 variations
male_count = clean_gender_ind.loc['Male', 'SN'].count()
female_count = clean_gender_ind.loc['Female', 'SN'].count()
other_count = clean_gender_ind.loc['Other / Non-Disclosed', 'SN'].count()
# Count persentages
male_perc = round((male_count/total_players*100), 2)
female_perc = round((female_count/total_players*100), 2)
other_perc = round((other_count/total_players*100), 2)
# Create DF
gender_demo = {'Gender': ['Male', 'Female', 'Other / Non-Disclose'],
'Total count': [male_count, female_count, other_count],
'Percentage of Players': [male_perc, female_perc, other_perc]}
gender_demo_df = pd.DataFrame(gender_demo)
new_gender_demo = gender_demo_df.set_index('Gender')
new_gender_demo
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Percentage of Players | Total count | |
---|---|---|
Gender | ||
Male | 81.15 | 465 |
Female | 17.45 | 100 |
Other / Non-Disclose | 1.40 | 8 |
Purchasing Analysis (Gender)
# Set Gender as index in original df
gender_ind = heroes.set_index('Gender')
# Count a number of purchases
male_purch = gender_ind.loc['Male', 'SN'].count()
female_purch = gender_ind.loc['Female', 'SN'].count()
other_purch = gender_ind.loc['Other / Non-Disclosed', 'SN'].count()
# Count average and total purchases
male_avg_price = round(gender_ind.loc['Male', 'Price'].mean(),2)
male_total = round(gender_ind.loc['Male', 'Price'].sum(), 2)
female_avg_price = round(gender_ind.loc['Female', 'Price'].mean(),2)
female_total = round(gender_ind.loc['Female', 'Price'].sum(), 2)
other_avg_price = round(gender_ind.loc['Other / Non-Disclosed', 'Price'].mean(),2)
other_total = round(gender_ind.loc['Other / Non-Disclosed', 'Price'].sum(), 2)
# Count normalized totals
male_norm_purch = round((male_total / male_count), 2)
female_norm_purch = round((female_total / female_count), 2)
other_norm_purch = round((other_total / other_count), 2)
# Create answer DF
purch_an_gender = {'Gender': ['Male', 'Female', 'Other / Non-Disclose'],
'Purchase Count': [male_purch, female_purch, other_purch],
'Average Purchase Price': [male_avg_price, female_avg_price, other_avg_price],
'Total Purchase Value': [male_total, female_total, other_total],
'Normalized Totals': [male_norm_purch, female_norm_purch, other_norm_purch]}
purch_an_gender_df = pd.DataFrame(purch_an_gender)
purch_gender_answ = purch_an_gender_df.set_index('Gender')
purch_gender_answ['Average Purchase Price']=purch_gender_answ['Average Purchase Price'].map("$ {:,.2f}".format)
purch_gender_answ['Total Purchase Value']=purch_gender_answ['Total Purchase Value'].map("$ {:,.2f}".format)
purch_gender_answ['Normalized Totals']=purch_gender_answ['Normalized Totals'].map("$ {:,.2f}".format)
purch_gender_an = purch_gender_answ[['Purchase Count', 'Average Purchase Price', 'Total Purchase Value', 'Normalized Totals']]
purch_gender_an
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Purchase Count | Average Purchase Price | Total Purchase Value | Normalized Totals | |
---|---|---|---|---|
Gender | ||||
Male | 633 | $ 2.95 | $ 1,867.68 | $ 4.02 |
Female | 136 | $ 2.82 | $ 382.91 | $ 3.83 |
Other / Non-Disclose | 11 | $ 3.25 | $ 35.74 | $ 4.47 |
Age Demographics
# Check min and max values
print(heroes['Age'].max())
print(heroes['Age'].min())
45
7
# Create bins and labels to them
bins = [4,9,14,19,24,29,34,39,44,49]
labels = ['5 to 9', '10 to 14', '15 to 19', '20 to 24', '25 to 29', '30 to 34',
'35 to 39', '40 to 44', '45 to 50']
# Add new column with bins
heroes['Age Bins'] = pd.cut(heroes["Age"],bins,labels=labels)
# Group by a new column
age_grouped = heroes.groupby('Age Bins')
# Pick up values
age_purch_count = (age_grouped['Age'].count())
age_avg_purch_price = round((age_grouped['Price'].mean()), 2)
age_total_purch_value = (age_grouped['Price'].sum())
age_df = heroes[['SN', 'Age', 'Age Bins']]
# Delete all duplicates in SN, exept the last one (keep='last')
clean_age = age_df.drop_duplicates(['SN'], keep='last')
clean_age_grouped = clean_age.groupby('Age Bins')
age_demo_df = pd.DataFrame({'Total Count': clean_age_grouped.size(),
'Percentage of Players': round(((clean_age_grouped.size()/total_players)*100),2)})
age_demo_df
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Percentage of Players | Total Count | |
---|---|---|
Age Bins | ||
5 to 9 | 3.32 | 19 |
10 to 14 | 4.01 | 23 |
15 to 19 | 17.45 | 100 |
20 to 24 | 45.20 | 259 |
25 to 29 | 15.18 | 87 |
30 to 34 | 8.20 | 47 |
35 to 39 | 4.71 | 27 |
40 to 44 | 1.75 | 10 |
45 to 50 | 0.17 | 1 |
Purchasing Analysis (Age)
purch_age_df = pd.DataFrame({'Purchase Count': age_purch_count,
'Average Purchase Price': age_avg_purch_price,
'Total Purchase Value': age_total_purch_value,
'Normalized Totals': round(age_total_purch_value/clean_age_grouped.size(), 2),
'Total percentage': round((age_total_purch_value/total_revenue*100),2)})
purch_age_df['Average Purchase Price'] = purch_age_df['Average Purchase Price'].map("$ {:,.2f}".format)
purch_age_df['Total Purchase Value'] = purch_age_df['Total Purchase Value'].map("$ {:,.2f}".format)
purch_age_df['Normalized Totals'] = purch_age_df['Normalized Totals'].map("$ {:,.2f}".format)
purch_age_df1 = purch_age_df[['Purchase Count', 'Average Purchase Price', 'Total Purchase Value', 'Normalized Totals',
'Total percentage']]
purch_age_df1
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Purchase Count | Average Purchase Price | Total Purchase Value | Normalized Totals | Total percentage | |
---|---|---|---|---|---|
Age Bins | |||||
5 to 9 | 28 | $ 2.98 | $ 83.46 | $ 4.39 | 3.65 |
10 to 14 | 35 | $ 2.77 | $ 96.95 | $ 4.22 | 4.24 |
15 to 19 | 133 | $ 2.91 | $ 386.42 | $ 3.86 | 16.90 |
20 to 24 | 336 | $ 2.91 | $ 978.77 | $ 3.78 | 42.81 |
25 to 29 | 125 | $ 2.96 | $ 370.33 | $ 4.26 | 16.20 |
30 to 34 | 64 | $ 3.08 | $ 197.25 | $ 4.20 | 8.63 |
35 to 39 | 42 | $ 2.84 | $ 119.40 | $ 4.42 | 5.22 |
40 to 44 | 16 | $ 3.19 | $ 51.03 | $ 5.10 | 2.23 |
45 to 50 | 1 | $ 2.72 | $ 2.72 | $ 2.72 | 0.12 |
Top Spenders
# Extract SN and Price, group by SN, create new DF and sort it by price sum()
price_sn = heroes[['SN', 'Price']]
pr_sn_grouped = price_sn.groupby(['SN'])
price_df = pd.DataFrame(pr_sn_grouped.sum())
price_sorted = price_df.sort_values('Price', ascending=False)
spenders_price = price_sorted.head(5)
# Reset index to default to be able to merge later
spenders_price.reset_index(inplace=True)
# Create a second pd with purchase counts
price_count_df = pd.DataFrame(pr_sn_grouped.count())
price_count_df.reset_index(inplace=True)
# Merge in one
merged_price = pd.merge(spenders_price, price_count_df, on='SN')
merged_price.columns = ['SN', 'Total Purchase Value', 'Purchase Count']
merged_price['Average Purchase Price'] = round((merged_price['Total Purchase Value'] / merged_price['Purchase Count']),2)
top5answer = merged_price.set_index('SN')
top5answer['Average Purchase Price'] = top5answer['Average Purchase Price'].map("$ {:,.2f}".format)
top5answer['Total Purchase Value'] = top5answer['Total Purchase Value'].map("$ {:,.2f}".format)
top5 = top5answer[['Purchase Count', 'Average Purchase Price', 'Total Purchase Value']]
top5
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Purchase Count | Average Purchase Price | Total Purchase Value | |
---|---|---|---|
SN | |||
Undirrala66 | 5 | $ 3.41 | $ 17.06 |
Saedue76 | 4 | $ 3.39 | $ 13.56 |
Mindimnya67 | 4 | $ 3.18 | $ 12.74 |
Haellysu29 | 3 | $ 4.24 | $ 12.73 |
Eoda93 | 3 | $ 3.86 | $ 11.58 |
Most Popular Items
# There are 183 unique Item IDs and only 179 unique Names, that's why this task may have 2 different answers,
# depends on what to use as a base - Item ID or Item Name. The correct one is to use a longer column - Item ID
# Extract necessary columns
items = heroes[['Item ID','Item Name', 'Price', 'SN']]
#items['SN'] = 1 - calls a warning, so..
# Assign values 1 to SN - better way, without warnings
items = items.assign(SN= 1)
items.columns = ['Item ID','Item Name', 'Total Purchase Value', 'Purchase Count']
# Group by Name, sort by number of purchases and reset index
items_grouped = items.groupby(['Item ID', 'Item Name'])
items_df = pd.DataFrame(items_grouped.sum())
items_sorted = items_df.sort_values('Purchase Count', ascending=False)
popular_items = items_sorted.head(5)
popular_items = popular_items.assign(Item_Price=(popular_items['Total Purchase Value'] / popular_items['Purchase Count']))
popular_items['Total Purchase Value'] = popular_items['Total Purchase Value'].map("$ {:,.2f}".format)
popular_items['Item_Price'] = popular_items['Item_Price'].map("$ {:,.2f}".format)
popular_items_answ = popular_items[['Purchase Count', 'Item_Price', 'Total Purchase Value']]
popular_items_answ
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Purchase Count | Item_Price | Total Purchase Value | ||
---|---|---|---|---|
Item ID | Item Name | |||
39 | Betrayal, Whisper of Grieving Widows | 11 | $ 2.35 | $ 25.85 |
84 | Arcane Gem | 11 | $ 2.23 | $ 24.53 |
31 | Trickster | 9 | $ 2.07 | $ 18.63 |
175 | Woeful Adamantite Claymore | 9 | $ 1.24 | $ 11.16 |
13 | Serenity | 9 | $ 1.49 | $ 13.41 |
# **Most Popular Items** Incorrect answer:
# Based on Item Name, results are not correct
# Leave it here just in case:
# Extract necessary columns
#items2 = heroes[['Item Name', 'Price', 'SN']]
#items['SN'] = 1 - calls a warning, so..
# Assign values 1 to SN - better way without warnings
#items2 = items2.assign(SN= 1)
#items2.columns = ['Item Name', 'Total Purchase Value', 'Purchase Count']
# Group by Name, sort by number of purchases and reset index
#items_grouped2 = items2.groupby(['Item Name'])
#items_df2 = pd.DataFrame(items_grouped2.sum())
#items_sorted2 = items_df2.sort_values('Purchase Count', ascending=False)
#popular_items2 = items_sorted2.head(5)
#popular_items2.reset_index(inplace=True)
# Create Item ID df
#item_id2 = heroes[['Item Name', 'Item ID']]
# There are 183 unique Item IDs and only 179 unique Names, so I drop ID duplicates
#clean_id2 = item_id2.drop_duplicates(['Item ID'], keep='last')
# Merge 2 DFs
#merged_items2 = pd.merge(popular_items2, clean_id2, on='Item Name')
# Add total value
#merged_items2['Item Price'] = round((merged_items2['Total Purchase Value'] / merged_items2['Purchase Count']),2)
#popular_items_answer2 = merged_items2.set_index('Item ID')
#popular_items_answer2
Most Profitable Items
# Use df from previous step
profit_sorted = items_df.sort_values('Total Purchase Value', ascending=False)
profit_answer = profit_sorted.head(5)
profit_answer = profit_answer.assign(Item_Price=(profit_answer['Total Purchase Value'] / profit_answer['Purchase Count']))
#profit_answer['Item Price'] = profit_answer['Total Purchase Value'] / profit_answer['Purchase Count'] - calls warning
profit_answer['Total Purchase Value'] = profit_answer['Total Purchase Value'].map("$ {:,.2f}".format)
profit_answer['Item_Price'] = profit_answer['Item_Price'].map("$ {:,.2f}".format)
profit_it_answ = profit_answer[['Purchase Count', 'Item_Price', 'Total Purchase Value']]
profit_it_answ
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Purchase Count | Item_Price | Total Purchase Value | ||
---|---|---|---|---|
Item ID | Item Name | |||
34 | Retribution Axe | 9 | $ 4.14 | $ 37.26 |
115 | Spectral Diamond Doomblade | 7 | $ 4.25 | $ 29.75 |
32 | Orenmir | 6 | $ 4.95 | $ 29.70 |
103 | Singed Scalpel | 6 | $ 4.87 | $ 29.22 |
107 | Splitter, Foe Of Subtlety | 8 | $ 3.61 | $ 28.88 |
Observed Trends
-
Men compose 81% of all gamers. They make 82% of all purchases and spend just a little bit more than women.
-
Almost half of all gamers are 20 - 24 years old. Teenagers from 15 to 19 years represent 17% of gamers, and on the third place are people from 25 to 29 years with 15%. Their purchase activity has approximately the same numbers: 20 - 24 years - 43% of all purchases (despite of the fact, that their average purchase is smaller than the other age groups purchases), 15 - 19 and 25 29 make 17% and 16% of purchases respectively. That means that in average each gamer spends approximately the same amount of money.
-
The most popular items are less expensive then the average price, that indicates the tendency to buy cheaper items. Most of the profits are made by the items from the top of a price range. (May check the percentage of profit made by expensive items)