Code Monkey home page Code Monkey logo

kevin-titi / game_analytics_d14_retention_prediction Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 3.82 MB

A predictive model for player retention/churn on day-14 after game installation based on features such as in-game metrics, user behavior, and engagement patterns to identify players at risk of churning, accurately predicting 65% of all retention within the top 6% of total population.

Jupyter Notebook 100.00%
churn-prediction data-analytics exploratory-data-analysis game-analytics gradient-boosting logistic-regression python random-forest retention retention-analysis

game_analytics_d14_retention_prediction's Introduction

Game Analytics: Day 14 Player Retention/Churn Prediction

(There is no dataset provided. I only uploaded the code I used as well as the result. Please look at the attached PDF for a full presentation.)

We analyzed data spanning the initial day to day 7 post-game installation to forecast a player's likelihood of remaining in the game by day 14. Our predictive classification model leverages factors like user behavior, in-game metrics (such as win rate), and engagement patterns to identify players with high potential for long-term retention.

Predicting which players will play our game by machine learning models

The data was divided into training data for the model to learn and validation/testing data to evaluate the model with unseen data. We experimented with three machine learning algorithms: Logistic Regression, Random Forest, and Gradient Boosting. All algorithms were compared, and the best one was selected as our final model.

Slide5

Using day 7 player data might be too late as we continue to lose player base

As demonstrated in the plot, the longer we wait for players to play and collect data, the better the model can predict whether a player would quit the game. However, we noticed that most players would quit right after playing the game, resulting in a continued loss of the player base as we waited for data. Moving forward, the optimal day for data collection needs to be evaluated, but for now, we used day 7 data as it offered the best-performing models.

Gradient Boosting was chosen as our final model. Based on day 7 testing data, the model effectively captured 65% of all retention within the top 6% of the total population, prioritizing those with the highest retention likelihood. The model provided a similar performance to Random Forest but used simpler measures, making it easier to comprehend.

Slide8

Slide16

Slide15

How our classifying models work in detail

We defined day 14 retention as true if players were still playing the game on day 14. Our goal was to predict 6% of total players as retained, based on the average of total players in our data. We evaluated our models using the F1 score, which is an equal combination of Precision and Recall.

Slide7

First, our models assigned a retention likelihood for every player in our data. Next, we ranked players based on the likelihood, from the most to the least likely to keep playing our game. Finally, we looked at the top 6 percentile to identify the likelihood cutoff (22.6%) and classified players who were more than 22.6% likely as retained.

By not using a 50% cutoff, our models lost some accuracy, as it would overly classify retained players. However, using data from day 0 to day 3, no players had more than a 50% likelihood of being retained (our data is imbalanced, so the model simply placed everyone below 50%). The cutoff needed to be changed for us to classify such a model. This also improved our model performance, enhancing Recall at the cost of less Precision.

Slide6

game_analytics_d14_retention_prediction's People

Contributors

kevin-titi avatar

Stargazers

Anh Nguyen avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.