The heroes_of_pymoli_pandas from sonyasha

Heroes Of Pymoli Data Analysis

import pandas as pd

path = 'Resources/purchase_data.json'
heroes = pd.read_json(path)

Player Count

# Total number of unique players (based on SN)
total_players = len(heroes['SN'].unique())
# Create DF
total_players_df = pd.DataFrame({'Total number of players': [total_players]})
total_players_df

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Total number of players
0	573

Purchasing Analysis (Total)

# Number of unique items bought, different from item name
number_unique_items = len(heroes['Item ID'].unique())
# Average price
av_purch_price = round(heroes['Price'].mean(), 2)
# Number of purchases is equal to rows number (I hope.. )
number_purchases = heroes['Item ID'].count()
# Total revenue - sum of all prices
total_revenue = heroes['Price'].sum()
# Create DF
purch_analysis_df = pd.DataFrame({'Number of Unique Items': [number_unique_items], 'Average Purchase Price': [av_purch_price],
                                 'Total Number of Purchases': [number_purchases], 'Total Revenue': [total_revenue]})
purch_analysis_df['Average Purchase Price'] = purch_analysis_df['Average Purchase Price'].map("$ {:,.2f}".format)
purch_analysis_df['Total Revenue'] = purch_analysis_df['Total Revenue'].map("$ {:,.2f}".format)
purch_analysis = purch_analysis_df[['Number of Unique Items', 'Average Purchase Price', 'Total Number of Purchases',
                                   'Total Revenue']]
purch_analysis

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Number of Unique Items	Average Purchase Price	Total Number of Purchases	Total Revenue
0	183	$ 2.93	780	$ 2,286.33

Gender Demographics

# Check what we have in column 'Gender'
heroes['Gender'].value_counts()

Male                     633
Female                   136
Other / Non-Disclosed     11
Name: Gender, dtype: int64

# Create a DF with Gender & SN only
gender_df = heroes[['SN', 'Gender']]
# Delete all duplicates in SN, exept the last one (keep='last')
clean_gender = gender_df.drop_duplicates(['SN'], keep='last')
# Change index to Gender, to use loc. function
clean_gender_ind = clean_gender.set_index('Gender')
# Use loc. to count all 3 variations
male_count = clean_gender_ind.loc['Male', 'SN'].count()
female_count = clean_gender_ind.loc['Female', 'SN'].count()
other_count = clean_gender_ind.loc['Other / Non-Disclosed', 'SN'].count()
# Count persentages
male_perc = round((male_count/total_players*100), 2)
female_perc = round((female_count/total_players*100), 2)
other_perc = round((other_count/total_players*100), 2)
# Create DF
gender_demo = {'Gender': ['Male', 'Female', 'Other / Non-Disclose'],
              'Total count': [male_count, female_count, other_count],
              'Percentage of Players': [male_perc, female_perc, other_perc]}
gender_demo_df = pd.DataFrame(gender_demo)
new_gender_demo = gender_demo_df.set_index('Gender')
new_gender_demo

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Percentage of Players	Total count
Gender
Male	81.15	465
Female	17.45	100
Other / Non-Disclose	1.40	8

Purchasing Analysis (Gender)

# Set Gender as index in original df
gender_ind = heroes.set_index('Gender')

# Count a number of purchases
male_purch = gender_ind.loc['Male', 'SN'].count()
female_purch = gender_ind.loc['Female', 'SN'].count()
other_purch = gender_ind.loc['Other / Non-Disclosed', 'SN'].count()

# Count average and total purchases
male_avg_price = round(gender_ind.loc['Male', 'Price'].mean(),2)
male_total = round(gender_ind.loc['Male', 'Price'].sum(), 2)
female_avg_price = round(gender_ind.loc['Female', 'Price'].mean(),2)
female_total = round(gender_ind.loc['Female', 'Price'].sum(), 2)
other_avg_price = round(gender_ind.loc['Other / Non-Disclosed', 'Price'].mean(),2)
other_total = round(gender_ind.loc['Other / Non-Disclosed', 'Price'].sum(), 2)

# Count normalized totals
male_norm_purch = round((male_total / male_count), 2)
female_norm_purch = round((female_total / female_count), 2)
other_norm_purch = round((other_total / other_count), 2)

# Create answer DF
purch_an_gender = {'Gender': ['Male', 'Female', 'Other / Non-Disclose'],
                   'Purchase Count': [male_purch, female_purch, other_purch],
                   'Average Purchase Price': [male_avg_price, female_avg_price, other_avg_price], 
                  'Total Purchase Value': [male_total, female_total, other_total], 
                  'Normalized Totals': [male_norm_purch, female_norm_purch, other_norm_purch]}
purch_an_gender_df = pd.DataFrame(purch_an_gender)
purch_gender_answ = purch_an_gender_df.set_index('Gender')
purch_gender_answ['Average Purchase Price']=purch_gender_answ['Average Purchase Price'].map("$ {:,.2f}".format)
purch_gender_answ['Total Purchase Value']=purch_gender_answ['Total Purchase Value'].map("$ {:,.2f}".format)
purch_gender_answ['Normalized Totals']=purch_gender_answ['Normalized Totals'].map("$ {:,.2f}".format)
purch_gender_an = purch_gender_answ[['Purchase Count', 'Average Purchase Price', 'Total Purchase Value', 'Normalized Totals']]
purch_gender_an

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Purchase Count	Average Purchase Price	Total Purchase Value	Normalized Totals
Gender
Male	633	$ 2.95	$ 1,867.68	$ 4.02
Female	136	$ 2.82	$ 382.91	$ 3.83
Other / Non-Disclose	11	$ 3.25	$ 35.74	$ 4.47

Age Demographics

# Check min and max values
print(heroes['Age'].max())
print(heroes['Age'].min())

45
7

# Create bins and labels to them
bins = [4,9,14,19,24,29,34,39,44,49]
labels = ['5 to 9', '10 to 14', '15 to 19', '20 to 24', '25 to 29', '30 to 34',
          '35 to 39', '40 to 44', '45 to 50']
# Add new column with bins
heroes['Age Bins'] = pd.cut(heroes["Age"],bins,labels=labels)
# Group by a new column
age_grouped = heroes.groupby('Age Bins')
# Pick up values
age_purch_count = (age_grouped['Age'].count())
age_avg_purch_price = round((age_grouped['Price'].mean()), 2)
age_total_purch_value = (age_grouped['Price'].sum())
age_df = heroes[['SN', 'Age', 'Age Bins']]
# Delete all duplicates in SN, exept the last one (keep='last')
clean_age = age_df.drop_duplicates(['SN'], keep='last')
clean_age_grouped = clean_age.groupby('Age Bins')
age_demo_df = pd.DataFrame({'Total Count': clean_age_grouped.size(),
                      'Percentage of Players': round(((clean_age_grouped.size()/total_players)*100),2)})
age_demo_df

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Percentage of Players	Total Count
Age Bins
5 to 9	3.32	19
10 to 14	4.01	23
15 to 19	17.45	100
20 to 24	45.20	259
25 to 29	15.18	87
30 to 34	8.20	47
35 to 39	4.71	27
40 to 44	1.75	10
45 to 50	0.17	1

Purchasing Analysis (Age)

purch_age_df = pd.DataFrame({'Purchase Count': age_purch_count,
                           'Average Purchase Price': age_avg_purch_price,
                            'Total Purchase Value': age_total_purch_value,
                            'Normalized Totals': round(age_total_purch_value/clean_age_grouped.size(), 2),
                            'Total percentage': round((age_total_purch_value/total_revenue*100),2)})
purch_age_df['Average Purchase Price'] = purch_age_df['Average Purchase Price'].map("$ {:,.2f}".format)
purch_age_df['Total Purchase Value'] = purch_age_df['Total Purchase Value'].map("$ {:,.2f}".format)
purch_age_df['Normalized Totals'] = purch_age_df['Normalized Totals'].map("$ {:,.2f}".format)
purch_age_df1 = purch_age_df[['Purchase Count', 'Average Purchase Price', 'Total Purchase Value', 'Normalized Totals',
                             'Total percentage']]
purch_age_df1

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Purchase Count	Average Purchase Price	Total Purchase Value	Normalized Totals	Total percentage
Age Bins
5 to 9	28	$ 2.98	$ 83.46	$ 4.39	3.65
10 to 14	35	$ 2.77	$ 96.95	$ 4.22	4.24
15 to 19	133	$ 2.91	$ 386.42	$ 3.86	16.90
20 to 24	336	$ 2.91	$ 978.77	$ 3.78	42.81
25 to 29	125	$ 2.96	$ 370.33	$ 4.26	16.20
30 to 34	64	$ 3.08	$ 197.25	$ 4.20	8.63
35 to 39	42	$ 2.84	$ 119.40	$ 4.42	5.22
40 to 44	16	$ 3.19	$ 51.03	$ 5.10	2.23
45 to 50	1	$ 2.72	$ 2.72	$ 2.72	0.12

Top Spenders

# Extract SN and Price, group by SN, create new DF and sort it by price sum()
price_sn = heroes[['SN', 'Price']]
pr_sn_grouped = price_sn.groupby(['SN'])
price_df = pd.DataFrame(pr_sn_grouped.sum())
price_sorted = price_df.sort_values('Price', ascending=False)
spenders_price = price_sorted.head(5)
# Reset index to default to be able to merge later
spenders_price.reset_index(inplace=True)

# Create a second pd with purchase counts
price_count_df = pd.DataFrame(pr_sn_grouped.count())
price_count_df.reset_index(inplace=True)

# Merge in one
merged_price = pd.merge(spenders_price, price_count_df, on='SN')
merged_price.columns = ['SN', 'Total Purchase Value', 'Purchase Count']
merged_price['Average Purchase Price'] = round((merged_price['Total Purchase Value'] / merged_price['Purchase Count']),2)
top5answer = merged_price.set_index('SN')
top5answer['Average Purchase Price'] = top5answer['Average Purchase Price'].map("$ {:,.2f}".format)
top5answer['Total Purchase Value'] = top5answer['Total Purchase Value'].map("$ {:,.2f}".format)
top5 = top5answer[['Purchase Count', 'Average Purchase Price', 'Total Purchase Value']]
top5

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Purchase Count	Average Purchase Price	Total Purchase Value
SN
Undirrala66	5	$ 3.41	$ 17.06
Saedue76	4	$ 3.39	$ 13.56
Mindimnya67	4	$ 3.18	$ 12.74
Haellysu29	3	$ 4.24	$ 12.73
Eoda93	3	$ 3.86	$ 11.58

Most Popular Items

# There are 183 unique Item IDs and only 179 unique Names, that's why this task may have 2 different answers,
# depends on what to use as a base - Item ID or Item Name. The correct one is to use a longer column - Item ID

# Extract necessary columns
items = heroes[['Item ID','Item Name', 'Price', 'SN']]
#items['SN'] = 1 - calls a warning, so..
# Assign values 1 to SN - better way, without warnings
items = items.assign(SN= 1)
items.columns = ['Item ID','Item Name', 'Total Purchase Value', 'Purchase Count']
# Group by Name, sort by number of purchases and reset index
items_grouped = items.groupby(['Item ID', 'Item Name'])
items_df = pd.DataFrame(items_grouped.sum())
items_sorted = items_df.sort_values('Purchase Count', ascending=False)
popular_items = items_sorted.head(5)
popular_items = popular_items.assign(Item_Price=(popular_items['Total Purchase Value'] / popular_items['Purchase Count']))
popular_items['Total Purchase Value'] = popular_items['Total Purchase Value'].map("$ {:,.2f}".format)
popular_items['Item_Price'] = popular_items['Item_Price'].map("$ {:,.2f}".format)
popular_items_answ = popular_items[['Purchase Count', 'Item_Price', 'Total Purchase Value']]
popular_items_answ

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

		Purchase Count	Item_Price	Total Purchase Value
Item ID	Item Name
39	Betrayal, Whisper of Grieving Widows	11	$ 2.35	$ 25.85
84	Arcane Gem	11	$ 2.23	$ 24.53
31	Trickster	9	$ 2.07	$ 18.63
175	Woeful Adamantite Claymore	9	$ 1.24	$ 11.16
13	Serenity	9	$ 1.49	$ 13.41

# **Most Popular Items** Incorrect answer: 
# Based on Item Name, results are not correct

# Leave it here just in case:
# Extract necessary columns
#items2 = heroes[['Item Name', 'Price', 'SN']]
#items['SN'] = 1 - calls a warning, so..
# Assign values 1 to SN - better way without warnings
#items2 = items2.assign(SN= 1)
#items2.columns = ['Item Name', 'Total Purchase Value', 'Purchase Count']
# Group by Name, sort by number of purchases and reset index
#items_grouped2 = items2.groupby(['Item Name'])
#items_df2 = pd.DataFrame(items_grouped2.sum())
#items_sorted2 = items_df2.sort_values('Purchase Count', ascending=False)
#popular_items2 = items_sorted2.head(5)
#popular_items2.reset_index(inplace=True)
# Create Item ID df
#item_id2 = heroes[['Item Name', 'Item ID']]
# There are 183 unique Item IDs and only 179 unique Names, so I drop ID duplicates
#clean_id2 = item_id2.drop_duplicates(['Item ID'], keep='last')
# Merge 2 DFs
#merged_items2 = pd.merge(popular_items2, clean_id2, on='Item Name')
# Add total value
#merged_items2['Item Price'] = round((merged_items2['Total Purchase Value'] / merged_items2['Purchase Count']),2)
#popular_items_answer2 = merged_items2.set_index('Item ID')
#popular_items_answer2

Most Profitable Items

# Use df from previous step
profit_sorted = items_df.sort_values('Total Purchase Value', ascending=False)
profit_answer = profit_sorted.head(5)

profit_answer = profit_answer.assign(Item_Price=(profit_answer['Total Purchase Value'] / profit_answer['Purchase Count']))
#profit_answer['Item Price'] = profit_answer['Total Purchase Value'] / profit_answer['Purchase Count'] - calls warning
profit_answer['Total Purchase Value'] = profit_answer['Total Purchase Value'].map("$ {:,.2f}".format)
profit_answer['Item_Price'] = profit_answer['Item_Price'].map("$ {:,.2f}".format)
profit_it_answ = profit_answer[['Purchase Count', 'Item_Price', 'Total Purchase Value']]
profit_it_answ

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

		Purchase Count	Item_Price	Total Purchase Value
Item ID	Item Name
34	Retribution Axe	9	$ 4.14	$ 37.26
115	Spectral Diamond Doomblade	7	$ 4.25	$ 29.75
32	Orenmir	6	$ 4.95	$ 29.70
103	Singed Scalpel	6	$ 4.87	$ 29.22
107	Splitter, Foe Of Subtlety	8	$ 3.61	$ 28.88

Observed Trends

Men compose 81% of all gamers. They make 82% of all purchases and spend just a little bit more than women.
Almost half of all gamers are 20 - 24 years old. Teenagers from 15 to 19 years represent 17% of gamers, and on the third place are people from 25 to 29 years with 15%. Their purchase activity has approximately the same numbers: 20 - 24 years - 43% of all purchases (despite of the fact, that their average purchase is smaller than the other age groups purchases), 15 - 19 and 25 29 make 17% and 16% of purchases respectively. That means that in average each gamer spends approximately the same amount of money.
The most popular items are less expensive then the average price, that indicates the tendency to buy cheaper items. Most of the profits are made by the items from the top of a price range. (May check the percentage of profit made by expensive items)

sonyasha / heroes_of_pymoli_pandas Goto Github PK

heroes_of_pymoli_pandas's Introduction

heroes_of_pymoli_pandas's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent