This is the final project from the course "Advanced Machine Learning" at th University of Heidelberg in the summer semester 2021.
In this project, we use deep learning to segment Covid CT scans. The project aims to reproduce the paper "COVID-19 lung CT image segmentation using deep learning methods: U-Net versus SegNet" by Adnan Saood and Iyad Hatem. Besides that, additional experiments were performed to further investigate the potential of deep learning in Covid Segmentation.
This readme should mainly explain how to run the code in our repository. For the theoretical background, a report has been written that explains the methods and results more in detail. If you are interested in the theoretical background and our results, please contact one of the team members.
In theory, the code runs on GPU as well as on CPU. However, we would strongly recommend to run the training on a GPU as it takes very long otherwise.
Kim-Celine Kahl
[email protected]
Miguel Heidegger
[email protected]
Sophia Matthis
[email protected]
- Clone the repository in your preferred directory
git clone https://github.com/kck278/CovidCTSegmentation.git
- Install the required packages. We auto-generated a requirements.txt file from the conda environment that we used to run our code. You can install all these required packages via
or
conda install requirements.txt
pip install requirements.txt
- Download the dataset. We used the COVID-19 CT segmentation dataset. In the paper we tried to reproduce, only the first dataset on this website was used, which can be found in the section "Download data". However, we also implemented the possibility to extend the dataset with "Segmentation dataset nr. 2".
The preprocessed data is also pushed in this repository, so if you cloned the whole repository you might not need to execute this step. But if you also want to execute the preprocessing steps, you need to download the original NIfTI from the mentioned website and place the training images named astr_scan.nii.gz
indata/images/nifti/scan
, the training masks named astr_mask.nii.gz
indata/images/nifti/mask
and the lung masks named astr_lung_mask.nii.gz
indata/images/nifti/lung_mask
.
For the extended dataset you need to do the same, except that the downloaded NIfTI files must go in a subdirectory namedextended
e.g.data/images/nifti/mask/extended
for the training masks.
In theory, the preporcessed data is already pushed to this repository. Therefore, you should normally not need to preprocess the dataset. If you want to apply the preprocessing steps anyways (because you want to reproduce the results or want to use other data), you can run the code in data/convert_nifti.py. This was implemented to separate single slices from compressed NIfTI files and convert them to png. At the bottom of the file, you can see function calls how we used the code, however, you most likely need to modify this for your use case. If you have troubles with this, feel free to contact one of the contributors of this project.
Training a model is done by running file trainer.py. In the command line parameters can be passed in to determine the training parameter.
-m : neural network that is going to be trained. Choices: "UNet", "SegNet", "UNetMonai" or "SegNetOriginal", default: "UNet"
-c : number of classes for segmentation. Binary segmentation mode (2) or multi-class segmentation mode (4), int, default 2
-b : batch size, int, default 2
-e : epochs for training, int, default 160
-l : learning rate, float, default 1e-4
-r : resolution of input images, either 256 or 512, int, default 256
-ext : use extended dataset, bool, default False
exemplary command line for training:
python trainer.py -m "UNet" -c 2 -b 2 -l 1e-4
Running 5-fold cross validation works analogous to running a normal training, but using the file trainer_5_fold_cross_validation.py
python trainer_5_fold_cross_validation.py -m "UNet" -c 2 -b 2 -l 1e-4
To actually predict segmentations in a visible way, the file eval.py needs to be run. The desired network, number of classes, the version of the model, a specific image (if none is selected, all images are predicted) and resolution can be chosen by specifying them in the parameters of the method call.
eval_img(model_name='UNet', num_classes=4, version_name='version_59', image_name=None, resolution=256)
The data to run the nnU-Net experiments can be found in the nnUNet folder. There, the raw data to run the experiments is in nnUNet_raw_data. Additionally, some trained models can be found in nnUNet_trained_models. To undestand the structure of the nnUNet folder, have a look at the GitHub repository of the nnU-Net where this structure is explained in detail.
The main experiments we performed are from Task503_CovidMulticlass and Task504_CovidBinary. Unfortunately due to GitHub file size limits, we could not upload the trained models so you need to train again.
If you want to run them with 1000 epochs (default nnU-Net configuration) you can just pip install nnU-Net as described on GitHub. If you also want to train the models with 160 epochs or evaluate a model that was trained 160 epochs, you need to clone the repository according to the instructions of nnU-Net, step 2.ii. Next, you need to create a file "nnUNetTrainerOwn.py" with the following code:
from nnunet.training.network_training.nnUNetTrainerV2 import nnUNetTrainerV2
class nnUNetTrainerOwn(nnUNetTrainerV2):
def __init__(self, plans_file, fold, output_folder=None, dataset_directory=None, batch_dice=True, stage=None,
unpack_data=True, deterministic=True, fp16=False):
super().__init__(plans_file, fold, output_folder, dataset_directory, batch_dice, stage, unpack_data,
deterministic, fp16)
self.max_num_epochs = 160
This file then needs to be stored in the nnUNet repository in nnunet/training/network_training.
Then you can run a model according to the nnU-Net documentation. An example to train a model is
nnUNet_train 2d nnUNetTrainerV2 Task504_CovidBinary -f all
To evaluate the model, run
nnUNet_predict -i INPUT_FOLDER -o OUTPUT_FOLDER -t Task504_CovidBinary -m 2d -f all -tr nnUNetTrainerV2
If you want to run the multi-class experiment, replace Task504_CovidBinary by Task503_CovidMulticlass in these commands. If you want to use the nnUNetTrainerOwn for shorter training, replace nnUNetTrainerV2 by nnUNetTrainerOwn.
If you want to evaluate the performance of the nnU-Net, you can run nnUNet_evaluation. This reads in the segmentation masks from the previous nnUNet_predict command and the ground truth segmentation. You need to adapt the path in
y_hat, y = get_y(
prediction_folder="data/images/prediction/nnU-Net/binary_long",
ground_truth_folder="data/images/prediction/nnU-Net/true_binary"
)
There, set the prediction_folder to the OUTPUT_FOLDER of the nnUNet_predict command and the ground_truth files for the test set if you have stored them somewhere else (The ground truth are stored in this repository in "data/images/prediction/nnU-Net/true_binary" for the binary test images and "data/images/prediction/nnU-Net/true_multi_class" for muli-class). Also set the num_classes parameter (third parameter) in calculate_metrics to 2 or 4 for binary and multi-class evaluation respectively.
metrics = calculate_metrics(y_hat, y, 2)
For visualizing the feature maps of SegNet, the file feature_vis.py should be run. If changes to the parameters are desired, they can be changed directly in the parameters of the method call. Right now the best model according to accuracy is selected in binary segmentation mode.
If you are interested in the distribution of the infected tissue, you can have a look at the png files in data/heatmap. If you want to reproduce these heatmaps, simply run heatmap.py.