Code Monkey home page Code Monkey logo

samidullo-abdullayev / flights_arrival_delay_regression- Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 3.31 MB

This project aims to predict flight arrival delays using various machine learning algorithms. It involves EDA, feature engineering, and model tuning with XGBoost, LightGBM, CatBoost, SVM, Lasso, Ridge, Decision Tree, and Random Forest Regressors. The goal is to identify the best model for accurate predictions.

Jupyter Notebook 100.00%
catboost-regressor lightgbm-regressor voting-regressor xgboost-regression

flights_arrival_delay_regression-'s Introduction

Flight Arrival Delay Prediction Project

Overview

This project aims to predict flight arrival delays using various regression algorithms. The goal is to apply multiple machine learning models and identify the best-performing model for this task. The models used in this project include:

  • CatBoost Regressor
  • XGBoost Regressor
  • LightGBM Regressor
  • Support Vector Machine (SVM) Regressor
  • Lasso Regressor
  • Ridge Regressor
  • Decision Tree Regressor
  • Random Forest Regressor

Table of Contents

Introduction

Predicting flight arrival delays is a critical task for airlines and passengers. This project explores the application of several advanced machine learning algorithms to create an accurate prediction model for flight arrival delays. By comparing different models, I aim to find the best approach for this problem.

Project Structure

flight_arrival_delay_prediction/
├── data/
│   ├── train.csv
│   ├── test.csv
├── notebooks/
│   ├── EDA_and_feature_engineering.ipynb
│   ├── Delays_Modeling_popular_models.ipynb
│   ├── delays_modeling.ipynb
│   ├── lightgbm_parameter_tuning.ipynb
│   ├── cateboost_parameter_tuning.ipynb
│   ├── xgboost_parameter_tuning_and_voting_regressor.ipynb
│   ├── best_models_voting_regressor.ipynb
│   ├── best_single_model_lightgbm.ipynb
├── README.md
├── requirements.txt

Dependencies

To install the required dependencies, use the following command:

pip install -r requirements.txt

The main libraries used in this project include:

pandas
numpy
catboost
xgboost
lightgbm
scikit-learn
matplotlib
seaborn

Data

The data folder contains the training and test datasets (train.csv and test.csv). The data preprocessing steps are outlined in the EDA_and_feature_engineering.ipynb notebook. Preprocessing

Data preprocessing involves cleaning the dataset, handling missing values, feature engineering, and scaling the features. These steps are crucial for preparing the data for model training. The detailed steps are in the EDA_and_feature_engineering.ipynb notebook too. Model Training

The model training process, including hyperparameter tuning, is documented in the Delays_Modeling_popular_models.ipynb notebook. I use cross-validation to ensure robust performance estimation. The training includes:

Splitting the data into training and validation sets.
Training models using CatBoost, XGBoost, LightGBM, SVM, Lasso, Ridge, Decision Tree, and Random Forest regressors.
Hyperparameter tuning using grid search and random search techniques.

valuation

The model evaluation, including cross-validation and Mean Absolute Percentage Error (MAPE) calculation, is detailed in the xgboost_parameter_tuning_and_voting_regressor.ipynb notebook. I compare the performance of different models to identify the best one.

Results

After fine tuning every model and train them with train data, I found the best single model is LightGBM regressor and the best Voting Regressor model is combination of LightGBM, Catboost, XGBoost regressor with reaching 6.46 % MAPE

Usage

To preprocess the data, open and run the EDA_and_feature_engineering.ipynb notebook.

To train the models, open and run the Delays_Modeling_popular_models.ipynb notebook.

To evaluate the models and compare their performance, open and run the notebook named parameter_tuning.ipynb.

Contributing

Contributions are welcome! Please fork the repository and submit a pull request with your changes.

flights_arrival_delay_regression-'s People

Contributors

samidullo-abdullayev avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.