Code repository for the paper "A hybrid machine learning/deep learning COVID-19 severity predictive model from CT images and clinical data".
Due to the European Data Protection Law (GDPR16, European Union 2016-05-04) patients dataset is not in this repository. Data can be made available from the corresponding author on reasonable requests upon ethical comittee approval. Our predictive model is composed of the following steps:
(In notebook CT_Covid.ipynb)
- CT image preprocessing
- lungs extraction
- CNN classifier training (evaluated on the validation set)
(In notebook Feature_extraction_from_CT.ipynb)
- Feature extraction
- Principal Component Analysis of the extracted features
(In notebook Hybrid CatBoost Covid-19.ipynb)
- CatBoost preliminary classifier from clinical data and image extracted features
- Bayesian hyperparameter optimization with Optuna of the preliminary CatBoost classifier (on the training set)
- Feature selection with a voting BorutaSHAP (Boruta with Shapley values) based procedure with the CatBoost preliminary classifier (on the training set)
- CatBoost classifier on the reduced feature space
- Bayesian hyperparameter optimization with Optuna to create a shortlist of CatBoost models (on the training set)
- Choice of the best performing CatBoost model with overfitting detector (evaluated on the validation set)
- Retraining the best CatBoost model on joined training+evaluation set and evaluation on the testset.