Code Monkey home page Code Monkey logo

handle_imabalnce_class's Introduction

Working with highly imbalanced datasets in machine learning projects.

Basic Information:

This project was part of one my recent job interview skill test for a ?Machine learning engineer? position. I had to complete the project in 48 hours which includes writing a 10-page report in latex. The dataset has classes and highly imbalanced. The primary objective of this project was to handle data imbalance issue. In the following subsections, I describe three techniques I used to overcome the data imbalance problem.

Datasets

Datasets: There are three labels [1, 2, 3] in the training data which makes the problem a multi-class problem. Training datasets have 17 features and 38829 individual data point. Whereas in testing data, there are 16 features without the label and have 16641 data points. The training dataset is very unbalanced. The majority of the data belongs to class-1 (95 percent) whereas class-2 and class-3 have 3.0 percent and 0.87 percent data respectively. Since the datasets do not have any null values and already scaled, I did not do any further processing. Due to some internal reasons, I am not going to share the datasets but the detail results and techniques. The following figure show data imbalance.

Codes and Libraies

I have Used python 3.0. The following Python libraries are also required:

  • Jupyterlab
  • NumPy
  • Pandas
  • matplotlib
  • scikit-learn
  • scikit-learn
  • seaborn

    Contributors

    Sabber Ahamed [email protected]

    License

    MIT

  • handle_imabalnce_class's People

    Contributors

    msahamed avatar

    Recommend Projects

    • React photo React

      A declarative, efficient, and flexible JavaScript library for building user interfaces.

    • Vue.js photo Vue.js

      🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

    • Typescript photo Typescript

      TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

    • TensorFlow photo TensorFlow

      An Open Source Machine Learning Framework for Everyone

    • Django photo Django

      The Web framework for perfectionists with deadlines.

    • D3 photo D3

      Bring data to life with SVG, Canvas and HTML. 📊📈🎉

    Recommend Topics

    • javascript

      JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

    • web

      Some thing interesting about web. New door for the world.

    • server

      A server is a program made to process requests and deliver data to clients.

    • Machine learning

      Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

    • Game

      Some thing interesting about game, make everyone happy.

    Recommend Org

    • Facebook photo Facebook

      We are working to build community through open source technology. NB: members must have two-factor auth.

    • Microsoft photo Microsoft

      Open source projects and samples from Microsoft.

    • Google photo Google

      Google ❤️ Open Source for everyone.

    • D3 photo D3

      Data-Driven Documents codes.