Code Monkey home page Code Monkey logo

camanet's Introduction

CAMANet

[IJBHI 2023] This is the official implementation of CAMANet: Class Activation Map Guided Attention Network for Radiology Report Generation accepted to IEEE Journal of Biomedical and Health Informatics (J-BHI), 2023.

Abstract

Radiology report generation (RRG) has gained increasing research attention because of its huge potential to mitigate medical resource shortages and aid the process of disease decision making by radiologists. Recent advancements in RRG are largely driven by improving a model's capabilities in encoding single-modal feature representations, while few studies explicitly explore the cross-modal alignment between image regions and words. Radiologists typically focus first on abnormal image regions before composing the corresponding text descriptions, thus cross-modal alignment is of great importance to learn a RRG model which is aware of abnormalities in the image. Motivated by this, we propose a Class Activation Map guided Attention Network (CAMANet) which explicitly promotes cross-modal alignment by employing aggregated class activation maps to supervise cross-modal attention learning, and simultaneously enrich the discriminative information. CAMANet contains three complementary modules: a Visual Discriminative Map Generation module to generate the importance/contribution of each visual token; Visual Discriminative Map Assisted Encoder to learn the discriminative representation and enrich the discriminative information; and a Visual Textual Attention Consistency module to ensure the attention consistency between the visual and textual tokens, to achieve the cross-modal alignment. Experimental results demonstrate that CAMANet outperforms previous SOTA methods on two commonly used RRG benchmarks.

Citations

If you use or extend our work, please cite our paper.

@article{wang2024camanet,
  title={CAMANet: class activation map guided attention network for radiology report generation},
  author={Wang, Jun and Bhalerao, Abhir and Yin, Terry and See, Simon and He, Yulan},
  journal={IEEE Journal of Biomedical and Health Informatics},
  year={2024},
  publisher={IEEE}
}

Prerequisites

The following packages are required to run the scripts:

  • [Python >= 3.6]
  • [PyTorch = 1.6]
  • [Torchvision]
  • [Pycocoevalcap]
  • You can create the environment via conda:
conda env create --name [env_name] --file env.yml

Download Trained Models

You can download the trained models here.

Datasets

We use two datasets (IU X-Ray and MIMIC-CXR) in our paper.

For IU X-Ray, you can download the dataset from here.

For MIMIC-CXR, you can download the dataset from here.

After downloading the datasets, put them in the directory data.

Pseudo Label Generation

You can generate the pesudo label for each dataset by leveraging the automatic labeler ChexBert.

We also provide the generated labels in the files directory.

Our experiments were done on RTX A6000 card.

Train on IU X-Ray

Run bash run_iu.sh to train a model on the IU X-Ray data.

Run on MIMIC-CXR

Run bash run_mimic.sh to train a model on the MIMIC-CXR data.

Test on MIMIC-CXR

Run bash test_mimic.sh to train a model on the MIMIC-CXR data.

Acknowledgment

Our project references the codes in the following repos. Thanks for their works and sharing.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.