Code Monkey home page Code Monkey logo

pmc-vqa's Introduction

PMC-VQA

The official codes for PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering

PWC

PWC

We propose a generative-based model for medical visual understanding by aligning visual information from a pre-trained vision encoder with a large language model, and establish a scalable pipeline to construct a large-scale medical visual question-answering dataset, named PMC-VQA, which contains 227k VQA pairs of 149k images that cover various modalities or diseases.

The dataset is available at Huggingface

The model checkpoints are available at MedVInT-TE and MedVInT-TD. The previous checkpoint of MedVInT-TD was mistakenly uploaded. We have rectified the issue and updated the model's checkpoint on July 31. Now, you can access the correct and improved version of the model.

Usage

1. Create Environment

Please refer to https://github.com/chaoyi-wu/PMC-LLaMA

2. Prepare Dataset

Download from Huggingface and save into ./PMC-VQA

3. Model Checkpoints

Download the pre-trained MedVInT-TE, and save into ./src/MedVInT_TE/Results directly.

Download the pre-trained MedVInT-TD, and save into ./src/MedVInT_TD/Results directly.

See MedVInT_TE and MedVInT_TD for the details of training MedVInT_TE and MedVInT_TD.

Acknowledgement

CLIP -- https://github.com/openai/CLIP

PMC-CLIP -- https://github.com/WeixiongLin/PMC-CLIP

PMC-LLaMA -- https://github.com/zphang/minimal-llama

LLaMA: Open and Efficient Foundation Language Models -- https://arxiv.org/abs/2302.13971

We thank the authors for their open-sourced code and encourage users to cite their works when applicable.

Contribution

Please raise an issue if you need help, any contributions are welcomed.

Citation

If you use this code or use our pre-trained weights for your research, please cite our paper

@article{zhang2023pmcvqa,
      title={PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering}, 
      author={Xiaoman Zhang and Chaoyi Wu and Ziheng Zhao and Weixiong Lin and Ya Zhang and Yanfeng Wang and Weidi Xie},
      year={2023},
      journal={arXiv preprint arXiv:2305.10415},
}

pmc-vqa's People

Contributors

xiaoman-zhang avatar chaoyi-wu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.