Code Monkey home page Code Monkey logo

jspano95 / retail-customer-classification-modelling Goto Github PK

View Code? Open in Web Editor NEW
3.0 1.0 0.0 832 KB

Classification ML models for predicting customer outcomes (namely, whether they're likely to opt into email / catalog marketing) depending on customer demographics (age, proximity to store, gender, customer loyalty duration) as well as sales and shopping frequencies by department

Jupyter Notebook 100.00%
classification-models xgboost-classification logistic-regression-algorithm decision-tree-classifier random-forest-classifier knearest-neighbor-classification support-vector-classifier voting-classifier customer-segmentation feature-engineering

retail-customer-classification-modelling's Introduction

Retail-Customer-Classification-Modelling

Classification models for predicting customer outcomes in an unbalanced classification setting, with outcomes dependent on customer demographics (age, post_code, gender) as well shopping frequencies and average spends by department etc.

PURPOSE OF PROJECT:

  • The overarching purpose of these models (this project) is to determine why some loyalty customers have chosen to opt into email marketing and others have chosen to opt out. The data shows that those customers who opt in have higher average spends, thus it is important to determine whether customers who are already spending more choose to opt in or whether opting in results in a higher average spend. To determine this, I construct various data features which capture demographics of customers and utilise various classification models to determine outcomes based off the input demographics / data features. A desired takeaways is to understand what features underpin each group and inform decisions on how to influence customers to opt into email marketing.

THE DATA:

  • Original data is 52K entries of sales data across 12+ departments and several hundred unique customers with unique individual characteristics: post code, age, gender, shopping frequency and average spend across different departments. The data ranges from 2019 to early 2021 (~ 2Y )

PROJECT FLOW:

After initial data exploration and data cleaning, I create a variety of features for the classification models:

  • Customer Duration (Time between first and last transactions)
  • One-Hot Encode: post code data by customer, customer gender, and department sales frequencies by customer
  • Standardise these variables without mean (Ex. post-code) to preserve the sparse matrix nature of the data + (the age variable)

A severe class imbalance problem was present between customers who opted into email marketing (majority class) vs those who didn't (minority class). To remedy this, I upscaled the minority class with replacement to balance the two classes

  • Also tried downscaling the majority class in a separate iteration and found the former method to be superior

I first run the following models before tuning and compare performance (accuracy) across the average of (5) cross validations on the training data:

  • Naive Bayes (baseline measurement)
  • Logistic Regression
  • Decision Tree Classifier
  • Random Forest Classifier
  • K-nearest neighbors
  • Support Vector Classifier
  • XGBoost Classifier
  • Soft & Hard Voting Classifiers

the final result is the tuned support vector classifier as the clear winner with 90% accuracy on the training data and ~96% on the test data

retail-customer-classification-modelling's People

Contributors

jspano95 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.