Code Monkey home page Code Monkey logo

ibm-hr-analytics-employee-attrition-performance's Introduction

Comparison of Probabilistic Classifiers

IBM HR Analytics Employee Attrition & Performance

@ Jiayu Qi Nov 5,2018

Abstract

Classification is a data mining technique used to predict group label for data points in a given dataset. For binary classification, techniques like k-nearest neighbor, support vector machine and decision tree provide non-probabilistic results such as yes or no. On the other hand, Naive Bayes classification technique applies Bayes' theorem and assumes class conditional dependency, provides probabilities for each class. The paper focuses on the comparisons of the probabilities among different classification techniques; by converting non-probabilistic classifiers to probabilistic classifiers, we are able to evaluate each classifier on sensitivity, specificity, accuracy, AUC, and threshold. Specifically, we conduct our research on the dataset on IBM employee attrition, which is a binary class problem. The class distribution is unbalanced where we apply different preprocessing methods and to compare such as oversampling, undersampling, normalization, feature extraction and feature selection among Naive Bayes, KNN and SVM. After preprocessing, all three classifiers improved the prediction performance over unpreprocessed data. The results indicate that the support vector machine combined with oversampling and normalization achieves the best classification performance. Application of these models has the potential to help reduce employee attrition.

Introduction

In machine learning, a probabilistic classifier is a classifier that is able to predict, given an observation of an input, a probability distribution over a set of classes, rather than only outputting the most likely class that the observation should belong to. However, there are non-probabilistic classifiers that the uncertainty can’t be “quantified”. In this research project, we are interested in comparing non-probabilistic classifiers such as k-nearest neighbor (KNN) and Support Vector Machines (SVM) to probabilistic ones and compare with probabilistic classifier Naive Bayes; therefore, to gain the best probabilistic classifier quantifying the uncertainty of the case labeled into the certain class. Moreover, we are interested in exploring the effect of preprocessing steps on the overall performance of classifiers. The dataset we are using is IBM HR Analytics Attrition & Performance. Attrition in human resources refers to the gradual loss of employees over time. The goal is to find the best probabilistic classifier to predict the attrition of valuable employees.

ibm-hr-analytics-employee-attrition-performance's People

Contributors

yiziwinnie avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

ahirepankaj

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.