Code Monkey home page Code Monkey logo

heroes_of_pymoli_pandas's Introduction

Heroes Of Pymoli Data Analysis

import pandas as pd
path = 'Resources/purchase_data.json'
heroes = pd.read_json(path)

Player Count

# Total number of unique players (based on SN)
total_players = len(heroes['SN'].unique())
# Create DF
total_players_df = pd.DataFrame({'Total number of players': [total_players]})
total_players_df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Total number of players
0 573

Purchasing Analysis (Total)

# Number of unique items bought, different from item name
number_unique_items = len(heroes['Item ID'].unique())
# Average price
av_purch_price = round(heroes['Price'].mean(), 2)
# Number of purchases is equal to rows number (I hope.. )
number_purchases = heroes['Item ID'].count()
# Total revenue - sum of all prices
total_revenue = heroes['Price'].sum()
# Create DF
purch_analysis_df = pd.DataFrame({'Number of Unique Items': [number_unique_items], 'Average Purchase Price': [av_purch_price],
                                 'Total Number of Purchases': [number_purchases], 'Total Revenue': [total_revenue]})
purch_analysis_df['Average Purchase Price'] = purch_analysis_df['Average Purchase Price'].map("$ {:,.2f}".format)
purch_analysis_df['Total Revenue'] = purch_analysis_df['Total Revenue'].map("$ {:,.2f}".format)
purch_analysis = purch_analysis_df[['Number of Unique Items', 'Average Purchase Price', 'Total Number of Purchases',
                                   'Total Revenue']]
purch_analysis
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Number of Unique Items Average Purchase Price Total Number of Purchases Total Revenue
0 183 $ 2.93 780 $ 2,286.33

Gender Demographics

# Check what we have in column 'Gender'
heroes['Gender'].value_counts()
Male                     633
Female                   136
Other / Non-Disclosed     11
Name: Gender, dtype: int64
# Create a DF with Gender & SN only
gender_df = heroes[['SN', 'Gender']]
# Delete all duplicates in SN, exept the last one (keep='last')
clean_gender = gender_df.drop_duplicates(['SN'], keep='last')
# Change index to Gender, to use loc. function
clean_gender_ind = clean_gender.set_index('Gender')
# Use loc. to count all 3 variations
male_count = clean_gender_ind.loc['Male', 'SN'].count()
female_count = clean_gender_ind.loc['Female', 'SN'].count()
other_count = clean_gender_ind.loc['Other / Non-Disclosed', 'SN'].count()
# Count persentages
male_perc = round((male_count/total_players*100), 2)
female_perc = round((female_count/total_players*100), 2)
other_perc = round((other_count/total_players*100), 2)
# Create DF
gender_demo = {'Gender': ['Male', 'Female', 'Other / Non-Disclose'],
              'Total count': [male_count, female_count, other_count],
              'Percentage of Players': [male_perc, female_perc, other_perc]}
gender_demo_df = pd.DataFrame(gender_demo)
new_gender_demo = gender_demo_df.set_index('Gender')
new_gender_demo
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Percentage of Players Total count
Gender
Male 81.15 465
Female 17.45 100
Other / Non-Disclose 1.40 8

Purchasing Analysis (Gender)

# Set Gender as index in original df
gender_ind = heroes.set_index('Gender')

# Count a number of purchases
male_purch = gender_ind.loc['Male', 'SN'].count()
female_purch = gender_ind.loc['Female', 'SN'].count()
other_purch = gender_ind.loc['Other / Non-Disclosed', 'SN'].count()

# Count average and total purchases
male_avg_price = round(gender_ind.loc['Male', 'Price'].mean(),2)
male_total = round(gender_ind.loc['Male', 'Price'].sum(), 2)
female_avg_price = round(gender_ind.loc['Female', 'Price'].mean(),2)
female_total = round(gender_ind.loc['Female', 'Price'].sum(), 2)
other_avg_price = round(gender_ind.loc['Other / Non-Disclosed', 'Price'].mean(),2)
other_total = round(gender_ind.loc['Other / Non-Disclosed', 'Price'].sum(), 2)

# Count normalized totals
male_norm_purch = round((male_total / male_count), 2)
female_norm_purch = round((female_total / female_count), 2)
other_norm_purch = round((other_total / other_count), 2)

# Create answer DF
purch_an_gender = {'Gender': ['Male', 'Female', 'Other / Non-Disclose'],
                   'Purchase Count': [male_purch, female_purch, other_purch],
                   'Average Purchase Price': [male_avg_price, female_avg_price, other_avg_price], 
                  'Total Purchase Value': [male_total, female_total, other_total], 
                  'Normalized Totals': [male_norm_purch, female_norm_purch, other_norm_purch]}
purch_an_gender_df = pd.DataFrame(purch_an_gender)
purch_gender_answ = purch_an_gender_df.set_index('Gender')
purch_gender_answ['Average Purchase Price']=purch_gender_answ['Average Purchase Price'].map("$ {:,.2f}".format)
purch_gender_answ['Total Purchase Value']=purch_gender_answ['Total Purchase Value'].map("$ {:,.2f}".format)
purch_gender_answ['Normalized Totals']=purch_gender_answ['Normalized Totals'].map("$ {:,.2f}".format)
purch_gender_an = purch_gender_answ[['Purchase Count', 'Average Purchase Price', 'Total Purchase Value', 'Normalized Totals']]
purch_gender_an
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Purchase Count Average Purchase Price Total Purchase Value Normalized Totals
Gender
Male 633 $ 2.95 $ 1,867.68 $ 4.02
Female 136 $ 2.82 $ 382.91 $ 3.83
Other / Non-Disclose 11 $ 3.25 $ 35.74 $ 4.47

Age Demographics

# Check min and max values
print(heroes['Age'].max())
print(heroes['Age'].min())
45
7
# Create bins and labels to them
bins = [4,9,14,19,24,29,34,39,44,49]
labels = ['5 to 9', '10 to 14', '15 to 19', '20 to 24', '25 to 29', '30 to 34',
          '35 to 39', '40 to 44', '45 to 50']
# Add new column with bins
heroes['Age Bins'] = pd.cut(heroes["Age"],bins,labels=labels)
# Group by a new column
age_grouped = heroes.groupby('Age Bins')
# Pick up values
age_purch_count = (age_grouped['Age'].count())
age_avg_purch_price = round((age_grouped['Price'].mean()), 2)
age_total_purch_value = (age_grouped['Price'].sum())
age_df = heroes[['SN', 'Age', 'Age Bins']]
# Delete all duplicates in SN, exept the last one (keep='last')
clean_age = age_df.drop_duplicates(['SN'], keep='last')
clean_age_grouped = clean_age.groupby('Age Bins')
age_demo_df = pd.DataFrame({'Total Count': clean_age_grouped.size(),
                      'Percentage of Players': round(((clean_age_grouped.size()/total_players)*100),2)})
age_demo_df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Percentage of Players Total Count
Age Bins
5 to 9 3.32 19
10 to 14 4.01 23
15 to 19 17.45 100
20 to 24 45.20 259
25 to 29 15.18 87
30 to 34 8.20 47
35 to 39 4.71 27
40 to 44 1.75 10
45 to 50 0.17 1

Purchasing Analysis (Age)

purch_age_df = pd.DataFrame({'Purchase Count': age_purch_count,
                           'Average Purchase Price': age_avg_purch_price,
                            'Total Purchase Value': age_total_purch_value,
                            'Normalized Totals': round(age_total_purch_value/clean_age_grouped.size(), 2),
                            'Total percentage': round((age_total_purch_value/total_revenue*100),2)})
purch_age_df['Average Purchase Price'] = purch_age_df['Average Purchase Price'].map("$ {:,.2f}".format)
purch_age_df['Total Purchase Value'] = purch_age_df['Total Purchase Value'].map("$ {:,.2f}".format)
purch_age_df['Normalized Totals'] = purch_age_df['Normalized Totals'].map("$ {:,.2f}".format)
purch_age_df1 = purch_age_df[['Purchase Count', 'Average Purchase Price', 'Total Purchase Value', 'Normalized Totals',
                             'Total percentage']]
purch_age_df1
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Purchase Count Average Purchase Price Total Purchase Value Normalized Totals Total percentage
Age Bins
5 to 9 28 $ 2.98 $ 83.46 $ 4.39 3.65
10 to 14 35 $ 2.77 $ 96.95 $ 4.22 4.24
15 to 19 133 $ 2.91 $ 386.42 $ 3.86 16.90
20 to 24 336 $ 2.91 $ 978.77 $ 3.78 42.81
25 to 29 125 $ 2.96 $ 370.33 $ 4.26 16.20
30 to 34 64 $ 3.08 $ 197.25 $ 4.20 8.63
35 to 39 42 $ 2.84 $ 119.40 $ 4.42 5.22
40 to 44 16 $ 3.19 $ 51.03 $ 5.10 2.23
45 to 50 1 $ 2.72 $ 2.72 $ 2.72 0.12

Top Spenders

# Extract SN and Price, group by SN, create new DF and sort it by price sum()
price_sn = heroes[['SN', 'Price']]
pr_sn_grouped = price_sn.groupby(['SN'])
price_df = pd.DataFrame(pr_sn_grouped.sum())
price_sorted = price_df.sort_values('Price', ascending=False)
spenders_price = price_sorted.head(5)
# Reset index to default to be able to merge later
spenders_price.reset_index(inplace=True)

# Create a second pd with purchase counts
price_count_df = pd.DataFrame(pr_sn_grouped.count())
price_count_df.reset_index(inplace=True)

# Merge in one
merged_price = pd.merge(spenders_price, price_count_df, on='SN')
merged_price.columns = ['SN', 'Total Purchase Value', 'Purchase Count']
merged_price['Average Purchase Price'] = round((merged_price['Total Purchase Value'] / merged_price['Purchase Count']),2)
top5answer = merged_price.set_index('SN')
top5answer['Average Purchase Price'] = top5answer['Average Purchase Price'].map("$ {:,.2f}".format)
top5answer['Total Purchase Value'] = top5answer['Total Purchase Value'].map("$ {:,.2f}".format)
top5 = top5answer[['Purchase Count', 'Average Purchase Price', 'Total Purchase Value']]
top5
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Purchase Count Average Purchase Price Total Purchase Value
SN
Undirrala66 5 $ 3.41 $ 17.06
Saedue76 4 $ 3.39 $ 13.56
Mindimnya67 4 $ 3.18 $ 12.74
Haellysu29 3 $ 4.24 $ 12.73
Eoda93 3 $ 3.86 $ 11.58

Most Popular Items

# There are 183 unique Item IDs and only 179 unique Names, that's why this task may have 2 different answers,
# depends on what to use as a base - Item ID or Item Name. The correct one is to use a longer column - Item ID

# Extract necessary columns
items = heroes[['Item ID','Item Name', 'Price', 'SN']]
#items['SN'] = 1 - calls a warning, so..
# Assign values 1 to SN - better way, without warnings
items = items.assign(SN= 1)
items.columns = ['Item ID','Item Name', 'Total Purchase Value', 'Purchase Count']
# Group by Name, sort by number of purchases and reset index
items_grouped = items.groupby(['Item ID', 'Item Name'])
items_df = pd.DataFrame(items_grouped.sum())
items_sorted = items_df.sort_values('Purchase Count', ascending=False)
popular_items = items_sorted.head(5)
popular_items = popular_items.assign(Item_Price=(popular_items['Total Purchase Value'] / popular_items['Purchase Count']))
popular_items['Total Purchase Value'] = popular_items['Total Purchase Value'].map("$ {:,.2f}".format)
popular_items['Item_Price'] = popular_items['Item_Price'].map("$ {:,.2f}".format)
popular_items_answ = popular_items[['Purchase Count', 'Item_Price', 'Total Purchase Value']]
popular_items_answ
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Purchase Count Item_Price Total Purchase Value
Item ID Item Name
39 Betrayal, Whisper of Grieving Widows 11 $ 2.35 $ 25.85
84 Arcane Gem 11 $ 2.23 $ 24.53
31 Trickster 9 $ 2.07 $ 18.63
175 Woeful Adamantite Claymore 9 $ 1.24 $ 11.16
13 Serenity 9 $ 1.49 $ 13.41
# **Most Popular Items** Incorrect answer: 
# Based on Item Name, results are not correct

# Leave it here just in case:
# Extract necessary columns
#items2 = heroes[['Item Name', 'Price', 'SN']]
#items['SN'] = 1 - calls a warning, so..
# Assign values 1 to SN - better way without warnings
#items2 = items2.assign(SN= 1)
#items2.columns = ['Item Name', 'Total Purchase Value', 'Purchase Count']
# Group by Name, sort by number of purchases and reset index
#items_grouped2 = items2.groupby(['Item Name'])
#items_df2 = pd.DataFrame(items_grouped2.sum())
#items_sorted2 = items_df2.sort_values('Purchase Count', ascending=False)
#popular_items2 = items_sorted2.head(5)
#popular_items2.reset_index(inplace=True)
# Create Item ID df
#item_id2 = heroes[['Item Name', 'Item ID']]
# There are 183 unique Item IDs and only 179 unique Names, so I drop ID duplicates
#clean_id2 = item_id2.drop_duplicates(['Item ID'], keep='last')
# Merge 2 DFs
#merged_items2 = pd.merge(popular_items2, clean_id2, on='Item Name')
# Add total value
#merged_items2['Item Price'] = round((merged_items2['Total Purchase Value'] / merged_items2['Purchase Count']),2)
#popular_items_answer2 = merged_items2.set_index('Item ID')
#popular_items_answer2

Most Profitable Items

# Use df from previous step
profit_sorted = items_df.sort_values('Total Purchase Value', ascending=False)
profit_answer = profit_sorted.head(5)

profit_answer = profit_answer.assign(Item_Price=(profit_answer['Total Purchase Value'] / profit_answer['Purchase Count']))
#profit_answer['Item Price'] = profit_answer['Total Purchase Value'] / profit_answer['Purchase Count'] - calls warning
profit_answer['Total Purchase Value'] = profit_answer['Total Purchase Value'].map("$ {:,.2f}".format)
profit_answer['Item_Price'] = profit_answer['Item_Price'].map("$ {:,.2f}".format)
profit_it_answ = profit_answer[['Purchase Count', 'Item_Price', 'Total Purchase Value']]
profit_it_answ
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Purchase Count Item_Price Total Purchase Value
Item ID Item Name
34 Retribution Axe 9 $ 4.14 $ 37.26
115 Spectral Diamond Doomblade 7 $ 4.25 $ 29.75
32 Orenmir 6 $ 4.95 $ 29.70
103 Singed Scalpel 6 $ 4.87 $ 29.22
107 Splitter, Foe Of Subtlety 8 $ 3.61 $ 28.88

Observed Trends

  1. Men compose 81% of all gamers. They make 82% of all purchases and spend just a little bit more than women.

  2. Almost half of all gamers are 20 - 24 years old. Teenagers from 15 to 19 years represent 17% of gamers, and on the third place are people from 25 to 29 years with 15%. Their purchase activity has approximately the same numbers: 20 - 24 years - 43% of all purchases (despite of the fact, that their average purchase is smaller than the other age groups purchases), 15 - 19 and 25 29 make 17% and 16% of purchases respectively. That means that in average each gamer spends approximately the same amount of money.

  3. The most popular items are less expensive then the average price, that indicates the tendency to buy cheaper items. Most of the profits are made by the items from the top of a price range. (May check the percentage of profit made by expensive items)

heroes_of_pymoli_pandas's People

Contributors

sonyasha avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.