Code Monkey home page Code Monkey logo

medmnist's Introduction

MedMNIST

Jiancheng Yang, Rui Shi, Bingbing Ni, Bilian Ke

We present MedMNIST, a collection of 10 pre-processed medical open datasets. MedMNIST is standardized to perform classification tasks on lightweight 28 × 28 images, which requires no background knowledge. Covering the primary data modalities in medical image analysis, it is diverse on data scale (from 100 to 100,000) and tasks (binary/multi-class, ordinal regression and multi-label). MedMNIST could be used for educational purpose, rapid prototyping, multi-modal machine learning or AutoML in medical image analysis. Moreover, MedMNIST Classification Decathlon is designed to benchmark AutoML algorithms on all 10 datasets.

MedMNIST_Decathlon

For more details, please refer to our paper:

MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis (arXiv)

Key Features

  • Educational: Our multi-modal data, from multiple open medical image datasets with Creative Commons (CC) Licenses, is easy to use for educational purpose.
  • Standardized: Data is pre-processed into same format, which requires no background knowledge for users.
  • Diverse: The multi-modal datasets covers diverse data scales (from 100 to 100,000) and tasks (binary/multiclass, ordinal regression and multi-label).
  • Lightweight: The small size of 28 × 28 is friendly for rapid prototyping and experimenting multi-modal machine learning and AutoML algorithms.

Please note that this dataset is NOT intended for clinical use.

Code Structure

  • medmnist/:
    • dataset.py: PyTorch datasets and dataloaders of MedMNIST.
    • models.py: ResNet-18 and ResNet-50 models.
    • evaluator.py: Standardized evaluation functions.
    • info.py: Dataset information dict for each subset of MedMNIST.
  • train.py: The training and evaluation script to reproduce the baseline results in the paper.
  • getting_started.ipynb: Explore the MedMNIST dataset with jupyter notebook. It is ONLY intended for a quick exploration, i.e., it does not provide full training and evaluation functionalities (please refer to train.py instead).
  • setup.py: The script to install medmnist as a module

Requirements

The code requires only common Python environments for machine learning; Basicially, it was tested with

  • Python 3 (Anaconda 3.6.3 specifically)
  • PyTorch==0.3.1
  • numpy==1.18.5, pandas==0.25.3, scikit-learn==0.22.2, tqdm

Higher (or lower) versions should also work (perhaps with minor modifications).

Dataset

You could download the dataset(s) via the following free accesses:

The dataset contains ten subsets, and each subset (e.g., pathmnist.npz) is comprised of train_images, train_labels, val_images, val_labels, test_images and test_labels.

How to run the experiments

  • Download the dataset manually or automatically (by setting download=True in dataset.py).

  • [optional] Install medmnist as a module by using command python setup.py install

  • Run the demo code train.py script in terminal.

    First, change directory to where train.py locates. Then, use command python train.py --data_name xxxmnist --input_root input --output_root output --num_epoch 100 --download True to run the experiments, where xxxmnist is subset of our MedMNIST (e.g., pathmnist), input is the path of the data files, output is the folder to save the results, num_epoch is the number of epochs of training, and download is the bool value whether download the dataset.

    For instance, to run PathMNIST

    python train.py --data_name pathmnist --input_root <path/to/input/folder> --output_root <path/to/output/folder> --num_epoch 100 --download True
    

Citation

If you find this project useful, please cite our paper as:

  Jiancheng Yang, Rui Shi, Bingbing Ni. "MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis," arXiv preprint arXiv:2010.14925, 2020.

or using bibtex:

 @article{medmnist,
 title={MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis},
 author={Yang, Jiancheng and Shi, Rui and Ni, Bingbing},
 journal={arXiv preprint arXiv:2010.14925},
 year={2020}
 }

LICENSE

The code is under Apache-2.0 License.

The datasets are under Creative Commons (CC) Licenses in general, please refer to the project page for details.

medmnist's People

Contributors

duducheng avatar threesrr avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.