Code Monkey home page Code Monkey logo

medicalgpt's Introduction

Medical Report Generation (& VQA) using a VLM (XrayGPT-Based).

About XrayGPT

XrayGPT is a state-of-the-art model for chest radiology report generation using large medical vision-language models. Built on top of BLIP-2 and MedCLIP, XrayGPT aligns a frozen visual encoder with a frozen large language model (LLM), Vicuna, using a linear projection layer. This repository extends XrayGPT for general-purpose medical report generation and Visual Question Answering (VQA).

Using This Repository

Installation

Due to inconsistencies and incompatibilities among various libraries in the original codebase, a new environment is created to run the code in a Runpod container. The environment is based on Python 3.10, PyTorch 2.0.0, and CUDA 11.8.

Runpod Website

Use the Runpod Template pytorch:2.1.0-py3.10-cuda11.8.0 and run the following commands to install the required libraries:

apt-get update -y && apt-get install zip unzip vim -y
python -m pip install --upgrade pip
pip install gdown
pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu118
pip install -r hard_requirements.txt --no-deps
pip install pydantic==1.10.7
pip install hyperframe==5.2.0
pip install gradio==3.23.0
pip install safetensors==0.4.3

Setup

Below is a brief overview of the steps for fine-tuning the trained XrayGPT model. Instructions for training XrayGPT from scratch are provided in the original repository.

1. Prepare the Datasets for Training

Publicly available datasets for medical report generation predominantly focus on chest X-ray reports, often derived from sources like MIMIC-CXR/OpenI. While these datasets are valuable, they lack diversity in terms of medical imaging modalities. To address this limitation and enhance the model's capabilities for multi-modality report generation and Visual Question Answering (VQA), we curated a unique dataset by integrating two distinct datasets: OpenI and ROCO.

OpenI Dataset: OpenI is a well-known resource provided by the Indiana University School of Medicine, comprising chest X-ray images paired with corresponding radiology reports.

  • Kaggle Download: Link
  • Description: Radiology reports and chest X-ray images
  • Samples: 4,000
  • Usage: Report generation (Chest X-ray)

ROCO Dataset: ROCO (Radiology Objects in COntext) is a multimodal medical image dataset enriched with descriptive captions, offering a broader spectrum of medical imaging scenarios.

  • Description: Multimodal images with detailed descriptive captions
  • Dataset Size: 8,000 samples (validation split used)
  • Usage: Enables the model to generalize across various medical imaging modalities beyond chest X-rays.

By processing the OpenI and ROCO datasets using the OpenAI API and combining them, we created a comprehensive dataset suitable for training our model. The data integration resulted in a structured dataset stored in the dataset folder, facilitating efficient training and evaluation processes.

The final structure of the dataset folder is as follows:

dataset
├── image
|   ├──1.jpg
|   ├──2.jpg
|   ├──3.jpg
|    .....
├── filter_cap.json

2. Prepare the Pretrained Vicuna Weights

Download the finetuned version of Vicuna-7B from the original XrayGPT link. The final weights should be in a single folder with a structure similar to the following:

vicuna_weights
├── config.json
├── generation_config.json
├── pytorch_model.bin.index.json
├── pytorch_model-00001-of-00003.bin
...   

3. Download the Minigpt-4 Checkpoint

Download the Minigpt-4 checkpoint from the trained XrayGPT model.

Model Training

Here we fine-tuned a pretrained XrayGPT model on the dataset created above. The model was initially trained on the MIMIC and OpenI datasets in a two-stage training process.

Run the following command:

python3 train.py --cfg-path train_configs/xraygpt_openi_finetune.yaml

Launching the Demo

Download the pretrained XrayGPT checkpoints from the link and add this checkpoint in eval_configs/xraygpt_eval.yaml.

Run the following command to launch the demo:

python demo.py --cfg-path eval_configs/xraygpt_eval.yaml --gpu-id 0

Citation

If you use this work, please cite the following original XrayGPT paper:

@article{Omkar2023XrayGPT,
    title={XrayGPT: Chest Radiographs Summarization using Large Medical Vision-Language Models},
    author={Omkar Thawkar, Abdelrahman Shaker, Sahal Shaji Mullappilly, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Jorma Laaksonen and Fahad Shahbaz Khan},
    journal={arXiv: 2306.07971},
    year={2023}
}

medicalgpt's People

Contributors

omkarthawakar avatar amshaker avatar abdur75648 avatar aaekay avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.