Code Monkey home page Code Monkey logo

lohithunnam / diabetes-prediction-mlops Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 1.0 145.86 MB

This comprehensive MLOps project focuses on Diabetes Prediction and utilizes a range of tools, including Prefect, MLflow, FastAPI, and Streamlit.

Python 78.63% Jupyter Notebook 16.35% C 0.56% Mako 0.01% JavaScript 0.03% Cython 1.24% C++ 3.08% Batchfile 0.01% Assembly 0.01% HTML 0.04% Roff 0.01% CSS 0.01% Lua 0.01% Meson 0.01% Fortran 0.02% Forth 0.01% Smarty 0.01% CMake 0.01% Jinja 0.01% Shell 0.01%

diabetes-prediction-mlops's Introduction

Diabetes Predictor: Early Detection for Healthier Lives

Introduction ๐Ÿš€

Welcome to our Diabetes Prediction project! ๐Ÿ“ˆ Our project focuses on using machine learning to predict diabetes, a common health condition. Predicting diabetes early can be really helpful in managing it effectively. In this guide, we'll walk you through how our project works, so you can understand, use, and even contribute to it. Together, we can make a positive impact on healthcare and well-being!

Requirements ๐Ÿ“„

  • Python 3.8 and above
  • Required python libraries - evidently, fastapi, imblearn, joblib, mlflow, numpy, pandas, prefect, scikit-learn, streamlit, uvicorn, xgboost
  • The Diabetes Prediction datset is available at click here

Getting started

  • Navigate to the Directory Where You Want to Clone the Repository: Use the cd (change directory) command to navigate to the directory where you want to clone the repository. For example: '''bash cd /path/to/your/desired/directory '''
  • Clone the Repository: To clone the "Diabetes-prediction-MLOps" repository, you need to use the following command.
git clone https://github.com/LohithUnnam/Diabetes-prediction-MLOps.git

Project Workflow: Building a Diabetes Prediction Pipeline

Our project, "Diabetes Predictor: Early Detection for Healthier Lives," is dedicated to providing accurate diabetes predictions through a meticulously designed pipeline.

Reliability in Predictions

In this project we've established an end-to-end workflow that encompasses data ingestion, data preprocessing, model development, model evaluation, and experiment tracking. This orchestration is driven by Prefect, streamlining the flow of tasks.

Experiment Tracking

Experiment tracking is the final step in our diabetes prediction workflow. It is responsible for logging important information about the trained machine learning model and the results of our predictions. We utilize MLflow for experiment tracking, which includes logging hyperparameters, metrics, and artifacts.

Key Workflow Steps

  • Data Ingestion (ingest_data): Collect and structure data into a DataFrame for analysis.

  • Data Preprocessing (clean_data): Clean and prepare the data for modeling.

  • Model Development (train_model): Train the machine learning model with autologging.

  • Model Evaluation (evaluation): Assess the model's performance and log key metrics.

  • Experiment Tracking (track_experiment): Log hyperparameters, metrics, and artifacts using MLflow for transparency and model management.

This streamlined workflow ensures efficient diabetes prediction and transparent model monitoring for proactive healthcare.

To run this workflow run the following command:

python execute_flow.py

To track your workflows in server run the below command:

prefect server start

After executing the command and navigating to the provided server link, you can view all your flow runs, both successful and failed, along with the execution times for each task and the entire workflow.

Image

Model Deployment with FastAPI

In our project, we have successfully deployed our diabetes prediction model using FastAPI, a modern, fast, and web-based framework for building APIs. This deployment allows healthcare professionals and individuals to access our model for making diabetes risk assessments.

Key Components:

FastAPI Setup: We've created a FastAPI application, defining routes and handling incoming requests for predictions.

Pre-trained Models: Our application loads pre-trained machine learning models, specifically Random Forest Classifier (rfc) and Decision Tree Classifier (dtc), from saved model files.

Data Input: We accept input data that represents various health and lifestyle attributes using the Pydantic library to validate and structure the input.

Model Selection: Users can specify the model they want to use for prediction, ensuring flexibility in choosing the appropriate model.

Data Preprocessing: Input data is preprocessed, with specific columns scaled using a pre-trained StandardScaler to align with the model's requirements.

Prediction: The selected model is used to predict diabetes risk based on the provided input data.

API Endpoints:

/predict/: This endpoint is responsible for receiving POST requests with input data and model selection, and it returns a prediction indicating the likelihood of diabetes (binary prediction).

/: A simple welcome endpoint, providing a friendly greeting to users.

This FastAPI deployment allows easy integration with other systems and provides a user-friendly interface for conducting diabetes risk assessments. Users can interact with our model effortlessly, making informed healthcare decisions based on our predictions. FastAPI Interface

Model Monitoring with Streamlit and Evidently

In our project, we've developed a Streamlit application that incorporates Evidently for monitoring the performance of our diabetes prediction model. This interactive application empowers healthcare professionals and individuals to gain insights into the model's behavior, assess its accuracy, and make informed decisions.

Key Components:

Streamlit Application: We've created a Streamlit web application, providing a user-friendly interface for monitoring our diabetes prediction model.

Data Loading: Our application loads pre-processed data from a CSV file, preparing it for analysis.

Model Selection: Users can choose from a set of models, including Random Forest, Logistic Regression, and Support Vector Machine (SVM).

Model Training: The selected model is trained on a subset of the available data, ensuring that the model aligns with the current dataset.

Probabilistic Predictions: Instead of making binary predictions, the application produces probabilistic predictions using the predict_proba method. These probabilities indicate the likelihood of an individual having diabetes.

Column Mapping: We configure Evidently's ColumnMapping to define the target column, the prediction column (probability), and the feature columns. This setup allows for precise monitoring.

Classification Report: We leverage Evidently's classification metrics preset to generate a detailed classification report. This report provides a comprehensive assessment of the model's performance, including accuracy, precision, recall, and more.

Interactive User Interface: Users can view the classification report within the Streamlit application, providing a dynamic and engaging experience for model monitoring.

User Experience:

Users can choose a model type (Random Forest, Logistic Regression, or SVM) in the application's sidebar.

The selected model is trained on a sample dataset to produce probabilistic predictions.

Evidently's classification report is generated, displaying key model performance metrics.

Users can interact with the report to gain insights into the model's strengths and weaknesses, ultimately making data-driven decisions.

This Streamlit application, combined with Evidently's monitoring capabilities, enhances the transparency and accountability of our diabetes prediction model. Users can effectively assess the model's accuracy and reliability, supporting proactive healthcare and well-being.

This is how our streamlit application will look like

This comprehensive MLOps project focuses on Diabetes Prediction and utilizes a range of tools, including Prefect, MLflow, FastAPI, and Streamlit.

diabetes-prediction-mlops's People

Contributors

lohithunnam avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

enzogalli

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.