This project involves analysing churn dataset to identify customers who will potentially churn. We will begin by performing EDA on the dataset. Then, we will train different models on it and perform hyperparameter tuning, before concluding our findings.
- Obtained dataset from kaggle.
- Performed EDA & identified several features potentially leading to churn.
- Trained different models using regularised logistic regression, svm, and xgboost.
- Achieved 91% recall in identifying customers who churned.
Quick Links:
- Read project online *Recommended for viewing
Alternatively, these files are also available to view/ download in the repo.
Some snapshots from the project can be found below:
The problem analysed here was to identify customers with higher churn probabilities. After preprocessing and undersampling our data, we looked at 3 different models, the logistic regression, linear svm, and xgb classifier. Opting to use recall as our metric, the xgb model eventually performed best, achieving a recall of 0.91. This implies that out of all customers that are expected to churn, we can identify approximately 91% of them. This will truly come helpful to a company looking to launch a marketing campaign to retain its customers.