This repository contains the code for my undergraduate thesis titled "Enhancing Battery State of Health Estimation using Machine Learning Techniques".
The project aims to estimate the State of Health of batteries using ensemble models. The project uses two datasets, Nasa
and Calce
, and four different regressors: Random Forest
, XGBoost
, LightGBM
, and Catboost
.
The Nasa
and Calce
folders under Data & Outputs
contain the datasets used in this project. Each folder contains processed CSV files and raw data files.
The project follows these steps:
- Exploratory Data Analysis (EDA): Understanding the data by visualizing it.
- Feature Engineering & Prerprocessing: Creating Features and Data preprocessing for later use.
- Feature Selection: Selecting the most relevant features for the predictive model.
- Modeling: Training predictive models to predict customer churn.
- Hyperparameter Tuning: Optimizing the model hyperparameters for better performance.
Scripts/
: Contains the main codes for the project.Data & Outputs/
: Stores both the processed and original datasets, and outputs.
The Scripts
folder contains Jupyter notebooks for different stages of the project:
EDA_FE_Preprocessing.ipynb
: Exploratory Data Analysis and Feature EngineeringFeature_Selection.ipynb
: Feature SelectionHyperparameter_Optimization.ipynb
: Hyperparameter TuningPerformance_Evaluation.ipynb
: Model Performance EvaluationPlot_Predictions.ipynb
: Plotting Predictions
To run the project, follow these steps:
- Clone the repository.
- Install the required dependencies.
- Run the .ipynb files in the
Scripts/
directory in the following order: (i) EDA_FE_Preprocessing.ipynb, (ii) Feature_Selection.ipynb, (iii) Hyperparameter_Optimization.ipynb, (iv) Performance_Evalutaion.ipynb.ipynb, (v) Plot_Predictions.ipynb.
For any queries, please reach out at [email protected].