Code Monkey home page Code Monkey logo

ensemble-learning-comparison-on-diabetes-classification's Introduction

Ensemble-Learning-Comparison-on-Diabetes-Classification

Comparison of ensemble learning methods on diabetes disease classification with various datasets

About The Project

  • This project compares various ensemble learning techniques for the classification of diabetes disease. Ensemble methods combine multiple machine learning models to improve predictive performance and robustness.
  • In this project, we explore and compare the effectiveness of popular ensemble algorithms, such as Random Forest, AdaBoost, Gradient Boosting, and more, in diagnosing diabetes based on three different datasets of relevant features.
  • Key Features:
    • Implementation of different ensemble methods for classification
    • Evaluation and comparison of model performance using metrics like accuracy, precision, recall, and F1-score
    • Jupyter notebooks with detailed explanations and visualizations
    • Dataset used for experimentation
    • Code for preprocessing, model training, and evaluation
  • This project has already been published in JMASIF (Jurnal Masyarakat Informatika) with the title Perbandingan Metode Ensemble Learning pada Klasifikasi Penyakit Diabetes.

Technology Used

  • Python
  • Pandas
  • Matplotlib
  • Seaborn
  • Scikit-learn
  • xgboost
  • lightgbm
  • catboost

Objectives/ Problems

Diabetes is a medical condition characterized by elevated blood sugar levels. According to the World Health Organization (WHO), the number of diabetes cases increased from 108 million to 422 million between 1980 and 2014. Machine Learning offers methods like Ensemble Learning for diabetes classification. This study compares three Ensemble Learning techniques, namely Bagging, Boosting, and Stacking, using three datasets: Pima Indians Diabetes, Frankfurt Hospital Diabetes, and Sylhet Hospital Diabetes.

Dataset Used

  1. Pima Indians Diabetes Database by UCI Machine Learning
  2. Frankfurt Hospital Diabetes Dataset by John
  3. Sylhet Hospital Diabetes Dataset by Ishan Dutta

Workflow

  • Data Preprocessing
    • MinMaxScaler for each dataset (change range of data to to fall within 0 and 1)
  • Data Exploration
  • Feature Engineering
  • Data Splitting
    • 80% Training data
    • 20% Testing data
  • Model Building
  • Model Training & Testing
  • Model Evaluation
    • Accuracy
    • Precision
    • Recall
    • F1-score

Algorithms/ Methods

Bagging Boosting Stacking
images/1.%20Bagging.jpg images/2.%20Boosting.jpg images/3.%20Stacking.jpg
  • Bagging
    • Bagging
    • Random Forest
    • Extra Trees
  • Boosting
    • Adaptative Boosting
    • Gradient Boosting
    • Extreme Gradient Boosting
    • Light Gradient Boosting
    • Cat Boosting
  • Stacking
    • Stacked Generalization

Performance (Accuracy)

Dataset 1 (Pima Indians Diabetes Database)

images/Grafik%20Akurasi_Dataset%201.png

Dataset 2 (Frankfurt Hospital Diabetes Dataset)

images/Grafik%20Akurasi_Dataset%202.png

Dataset 3 (Sylhet Hospital Diabetes Dataset)

images/Grafik%20Akurasi_Dataset%202.png

Conclusion

  • In general, all Boosting methods give the best results for all datasets, but specifically, the Light Gradient Boosting method gives the best results in most of the data (Dataset 2 & Dataset 3)

Publications

  • L. M. Cendani, and A. Wibowo, "Perbandingan Metode Ensemble Learning pada Klasifikasi Penyakit Diabetes," JURNAL MASYARAKAT INFORMATIKA, vol. 13, no. 1, pp. 33 - 44, May. 2022. https://doi.org/10.14710/jmasif.13.1.42912

Contributors

License

This project is licensed under the MIT License - see the LICENSE file for details

Acknowledgments

ensemble-learning-comparison-on-diabetes-classification's People

Contributors

linggarm avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.