Code Monkey home page Code Monkey logo

covid-chestxray-dataset's Introduction

๐Ÿ›‘ Note: please do not claim diagnostic performance of a model without a clinical study! This is not a kaggle competition dataset.

COVID-19 image data collection (๐ŸŽฌ video about the project)

Project Summary: To build a public open dataset of chest X-ray and CT images of patients which are positive or suspected of COVID-19 or other viral and bacterial pneumonias (MERS, SARS, and ARDS.). Data will be collected from public sources as well as through indirect collection from hospitals and physicians. All images and data will be released publicly in this GitHub repo.

This project is approved by the University of Montreal's Ethics Committee #CERSES-20-058-D

View current images and metadata

Current stats of PA, AP, and AP Supine views. Labels 0=No or 1=Yes. Data loader is here

COVID19_Dataset num_samples=244 views=['PA', 'AP']
{'ARDS': {0.0: 229, 1.0: 15},
 'Bacterial Pneumonia': {0.0: 227, 1.0: 17},
 'COVID-19': {0.0: 52, 1.0: 192},
 'Chlamydophila': {0.0: 243, 1.0: 1},
 'Fungal Pneumonia': {0.0: 231, 1.0: 13},
 'Klebsiella': {0.0: 243, 1.0: 1},
 'Legionella': {0.0: 242, 1.0: 2},
 'MERS': {0.0: 244},
 'No Finding': {0.0: 242, 1.0: 2},
 'Pneumocystis': {0.0: 231, 1.0: 13},
 'Pneumonia': {0.0: 2, 1.0: 242},
 'SARS': {0.0: 228, 1.0: 16},
 'Streptococcus': {0.0: 231, 1.0: 13},
 'Viral Pneumonia': {0.0: 36, 1.0: 208}}

COVID19_Dataset num_samples=38 views=['AP Supine']
{'ARDS': {0.0: 37, 1.0: 1},
 'Bacterial Pneumonia': {0.0: 38},
 'COVID-19': {0.0: 4, 1.0: 34},
 'Chlamydophila': {0.0: 38},
 'Fungal Pneumonia': {0.0: 38},
 'Klebsiella': {0.0: 38},
 'Legionella': {0.0: 38},
 'MERS': {0.0: 38},
 'No Finding': {0.0: 38},
 'Pneumocystis': {0.0: 38},
 'Pneumonia': {0.0: 4, 1.0: 34},
 'SARS': {0.0: 38},
 'Streptococcus': {0.0: 38},
 'Viral Pneumonia': {0.0: 4, 1.0: 34}}

Lung Bounding Boxes contributed by Andrew Gough at ImageAnnotation.AI, Inc.

Contribute

  • Submit data directly to the project (coming soon!)

  • We can extract images from publications. Help identify publications which are not already included using a GitHub issue (DOIs we have are listed in the metadata file). There is a searchable database of COVID-19 papers here, and a non-searchable one (requires download) here.

  • Submit data to these sites (we can scrape the data from them):

  • Provide bounding box/masks for the detection of problematic regions in images already collected.

  • See SCHEMA.md for more information on the metadata schema.

Formats: For chest X-ray dcm, jpg, or png are preferred. For CT nifti (in gzip format) is preferred but also dcms. Please contact with any questions.

Background

In the context of a COVID-19 pandemic, we want to improve prognostic predictions to triage and manage patient care. Data is the first step to developing any diagnostic/prognostic tool. While there exist large public datasets of more typical chest X-rays from the NIH [Wang 2017], Spain [Bustos 2019], Stanford [Irvin 2019], MIT [Johnson 2019] and Indiana University [Demner-Fushman 2016], there is no collection of COVID-19 chest X-rays or CT scans designed to be used for computational analysis.

The 2019 novel coronavirus (COVID-19) presents several unique features Fang, 2020 and Ai 2020. While the diagnosis is confirmed using polymerase chain reaction (PCR), infected patients with pneumonia may present on chest X-ray and computed tomography (CT) images with a pattern that is only moderately characteristic for the human eye Ng, 2020. In late January, a Chinese team published a paper detailing the clinical and paraclinical features of COVID-19. They reported that patients present abnormalities in chest CT images with most having bilateral involvement Huang 2020. Bilateral multiple lobular and subsegmental areas of consolidation constitute the typical findings in chest CT images of intensive care unit (ICU) patients on admission Huang 2020. In comparison, non-ICU patients show bilateral ground-glass opacity and subsegmental areas of consolidation in their chest CT images Huang 2020. In these patients, later chest CT images display bilateral ground-glass opacity with resolved consolidation Huang 2020.

Goal

Our goal is to use these images to develop AI based approaches to predict and understand the infection. Our group will work to release these models using our open source Chester AI Radiology Assistant platform.

The tasks are as follows using chest X-ray or CT (preference for X-ray) as input to predict these tasks:

  • Healthy vs Pneumonia (prototype already implemented Chester with ~74% AUC, validation study here)

  • Bacterial vs Viral vs COVID-19 Pneumonia (not relevant enough for the clinical workflows)

  • Prognostic/severity predictions (survival, need for intubation, need for supplemental oxygen)

Expected outcomes

Tool impact: This would give physicians an edge and allow them to act with more confidence while they wait for the analysis of a radiologist by having a digital second opinion confirm their assessment of a patient's condition. Also, these tools can provide quantitative scores to consider and use in studies.

Data impact: Image data linked with clinically relevant attributes in a public dataset that is designed for ML will enable parallel development of these tools and rapid local validation of models. Furthermore, this data can be used for completely different tasks.

Contact

PI: Joseph Paul Cohen. Postdoctoral Fellow, Mila, University of Montreal

Initial results

Citation

Paper availabe here

Joseph Paul Cohen and Paul Morrison and Lan Dao
COVID-19 image data collection, arXiv:2003.11597, 2020
https://github.com/ieee8023/covid-chestxray-dataset
@article{cohen2020covid,
  title={COVID-19 image data collection},
  author={Joseph Paul Cohen and Paul Morrison and Lan Dao},
  journal={arXiv 2003.11597},
  url={https://github.com/ieee8023/covid-chestxray-dataset},
  year={2020}
}

covid-chestxray-dataset's People

Contributors

ieee8023 avatar bganglia avatar lan-dao avatar juanmed avatar beatrizgarcias avatar ncovgt2020 avatar kant avatar vishalshar avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.