Code Monkey home page Code Monkey logo

data_science-supervised_machine_learning_classification_mushrooms's Introduction

Supervised Machine Learning - Classification

Goal

To explore the practical application of ML by trying to predict poisonous mushrooms, noticing the trade off between accuracy and safety.

Overview

We are interested in keeping Catalonian mushroom foragers safe from poisonous mushrooms, and therefore our aim is to completely eliminate Type II errors.

Context

In general, the aim of fine-tuning and perfecting the algorithms is to get our accuracy close to perfection. However, this time around the emphasis is on Error Types and the delicate dance between accuracy and safety.

  1. Are there any ML algorithms that by default err on the side of caution?
  2. Can we achieve 0 hospital cases with adjusting tresholds and exploring ROC curves?

Task:

  • Import mushroom database
  • Explore and analyze features
  • Experiment with several ML models
  • Experiment with tresholds while keeping an eye on accuracy
  • Explore ROC curve
  • Test our algorithm on data it has never seen before
  • Rinse and repeat

Deliverables

The Google Colab Notebook for trying out different ML algorithms is found here with a supporting Medium article that outlines my thinking process and practical takeaways more in detail here.

Skills & Tools

  1. Data Reading & Cleaning
  2. Data Splitting
  3. Building a Preprocessor
  4. LazyPredict & Modelling
  5. Error Analysis
  6. Tresholds and ROC Curve analysis

Note to the Reader about my choice of models to try:

My aim after running LazyPredict was to experiment with algorithms based on various mathematical models. RandomForest is a Decision Tree-based classifier, Label Propagation is a semi-supervised learning model, LGBM is a gradient boosting method, KNN groups data into “neighborhoods” based on similarities, while SVC looks for and calculates distances for the optimal hyperplane to divide the data into classes. By exploring various methods based on different mathematical models, I was curious whether any one of them would be more or less prone to a certain error type.

data_science-supervised_machine_learning_classification_mushrooms's People

Contributors

cintia0528 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.