Code Monkey home page Code Monkey logo

corona_ct_classification's Introduction

INFORMS 2020 QSR Data Challenge on "CT Scan Diagnosis for COVID-19"

We were selected as one of finalists for this challenge and won the runner-up award!!

The code of image classification on COVID dataset using pytorch on INFORMS 2020 QSR Data Challenge on COVID dataset. We use an ensemble model consisting of Densenet 121 and Residual Attention model. We first split 15% of the data into validation which is not used in the training process. And we select the model which has the highest validation accuracy. Densenet 121 is pretrained on ImageNet, and Residual Attention model is pretrained on Cifar-10. In training, we separately train these two pretrained models in an end-to-end manner. Then we extract features from the last 2nd layer, and perform another classifier on the learned concatenated features by these two models on the whole training dataset. Here we use SVM with random gaussian kernels.

Dependencies

  • Python3, Scikit-learn, torch (Please refer to requirement.txt)

Dataset

The data for this Data Challenge is selected from an open-source data set on COVID-19 CT images. The raw data have been divided into two subsets: training and test sets. The training dataset is provided to participants to develop their models. The training dataset consists of 251 COVID-19 and 292 non-COVID-19 CT images. In addition to the images, meta-information (e.g., patient information, severity, image caption) is provided in a spreadsheet. The details of the original dataset can be found in Zhao et al. (2020).

Curated Dataset

We extended this work by building a large lung CT scan dataset for COVID-19 curating data from 7 public datasets. The dataset and the dataset description are available in the following links: https://www.kaggle.com/maedemaftouni/large-covid19-ct-slice-dataset https://github.com/maftouni/Curated_Covid_CT.git

How to run

The training data is saved in data/training. If you want to use your own data, just replace everything in data/training. It contains two folders where one is COVID images, and another Non-Covid images. The test data should be put in data/test.

The performance might be a little different due to different performance of a certain seed on different devices.

data_prep.py

to train DenseNet121 model:

python Model_densenet121.py

to train residual_attention model:

python Model_residual_attention.py

to train the ensemble model:

python Model_Ensemble.py

Network Structure

Screenshot

Sample outputs

Sample classification results

Screenshot

Attention can be viewed, broadly, as a tool to focus the most on the most informative parts of the image:

Screenshot

Evaluation

Here we evaluate the performance of our best model on the training data.

Confusion Matrix

                  predict Covid       predict Non-Covid
Covid                 247                      4
Non-Covid              2                      290

Accuracy

Accuracy: 98.9%

Versioning

Version 1.0

Authors

Maede Maftouni, Andrew Chung Chee Law, Yangze Zhou, Bo Shen

Acknowledgments

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.