Code Monkey home page Code Monkey logo

ssl_micro_ultrasound's Introduction

Self-Supervised Learning with Limited Labeled Data for Prostate Cancer Detection in High Frequency Ultrasound

code for the paper Self-Supervised Learning with Limited Labeled Data for Prostate Cancer Detection in High Frequency Ultrasound

Abstract

Deep learning-based analysis of high-frequency, high-resolution micro-ultrasound data shows great promise for prostate cancer detection. Previous approaches to analysis of ultrasound data largely follow a supervised learning paradigm. Ground truth labels for ultrasound images used for training deep networks often include coarse annotations generated from the histopathological analysis of tissue samples obtained via biopsy. This creates inherent limitations on the availability and quality of labeled data, posing major challenges to the success of supervised learning methods. On the other hand, unlabeled prostate ultrasound data are more abundant. In this work, we successfully apply self-supervised representation learning to micro-ultrasound data. Using ultrasound data from 1028 biopsy cores of 391 subjects obtained in two clinical centres, we demonstrate that feature representations learnt with this method can be used to classify cancer from non-cancer tissue, obtaining an AUROC score of 91% on an independent test set. To the best of our knowledge, this is the first successful end-to-end self-supervised learning approach for prostate cancer detection using ultrasound data. Our method outperforms baseline supervised learning approaches, generalizes well between different data centers, and scale well in performance as more unlabeled data are added, making it a promising approach for future research using large volumes of unlabeled data.

The basic codebase is implemented in Python 3.8.13 and is provided in experiments folder. The package version used for development are as follows:

ROC Curves

ROC curves of patch wise predictions for UVA and CRCEO datasets.

Package Installation

Install all requirements using following commands

pip install -r requirements.txt

Note

The code uses PyTorch Lightning to remove code boiler plates. Hydra is used for configuration management, and wandb for logging. This code is tested on a single GPU.

Usage

The easiest way to read the code is as follows: train.py is the main python file to run the code. It uses hydra to load the main configuration file which is configs/train.yaml. This file contains configurations of datamodule, model, callbacks, trainer, and logger as separate groups of configurations stored in config/{ConfigGroup}. Having the configurations of all these groups, the corresponding classes are instantiated in training_pipeline.py based on the location of their code stored in _target_ in each group of configuration. Finally, different experiments can be run by overriding these group of configs python train.py datamodule=data.yaml model=model.yaml callbacks=callback.yaml etc. or simply passing all these arguments all at once stored for different experiments in configs/experiments by python train.py experiment=experiment_name. This automatically overrides all the config groups.

Data

The data used in this work is private. If you have access to data please add your username and passwork in .env file.

SERVER_USERNAME=yourusername
SERVER_PASSWORD=yourpassword

The data will be downloaded to your local machine. To specify the location of the data, please add the following line to the .env file:

DATA = /path/to/data

Training

To run supervised training

python train.py experiment=exact_supervised.yaml

To pretrain the model (with VICReg), run the following command:

python train.py experiment=exact_vicreg.yaml

To finetune the model, run the following command:

python train.py experiment=exact_finetune.yaml

Software & Hardware Specs

Experiments were run on a standard desktop with a single NVIDIA TITAN X GPU (24 GB GPU RAM), Intel(R) Core(TM) i9-9900X CPU @ 3.50GHz processor, running Ubuntu 22.05, Python 3.9 and Pytorch 1.13. With this configuration, each experiment took about 4 hours for stage 1, and 2 hours for stage 2 of our method. Although the total size of the dataset was 100 GB, we used memory mapping and only selected patches within the needle region, hence the CPU RAM footprint was kept under 8 GB.

Citation

If you find this code useful, please consider citing our paper:

Paul Wilson*, Mahdi Gilany*, Amoon Jamzad, Fahimeh Fooladgar, Minh To, Brian Wodlinger, Purang Abolmaesumi, Parvin Mousavi. (2022s). Self-Supervised Learning with Limited Labeled Data for Prostate Cancer Detection in High Frequency Ultrasound. arXiv preprint arXiv:2211.00527.

* indicates equal contribution

@article{wilson2022self,
  title={Self-Supervised Learning with Limited Labeled Data for Prostate Cancer Detection in High Frequency Ultrasound},
  author={Wilson*, Paul FR and Gilany*, Mahdi and Jamzad, Amoon and Fooladgar, Fahimeh and To, Minh Nguyen Nhat and Wodlinger, Brian and Abolmaesumi, Purang and Mousavi, Parvin},
  journal={arXiv preprint arXiv:2211.00527},
  year={2022}
}

ssl_micro_ultrasound's People

Contributors

mahdigilany avatar pfrwilson avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.