Code Monkey home page Code Monkey logo

medclip's Introduction

MedCLIP

MedCLIP is a medical image captioning Deep Learning neural network based on the OpenAI CLIP architecture.

Usage

Run main.ipynb on a Colab instance. Weights for the model are provided, so you don’t need to train again.

Introduction

CLIP is a beautiful hashing process.

Through encodings and transformations, CLIP learns relationships between natural language and images. The underlying model allows for either captioning of an image from a set of known captions, or searching an image from a given caption. With appropriate encoders, the CLIP model can be optimised for certain domain-specific applications. Our hope with MedCLIP is to help the radiologist emit diagnoses.

Explanation

CLIP works by encoding an image and a related caption into tensors. The model then optimises the last layer of the (transfer learnable) encoders to make both image and text encodings as similar as possible. (1. Contrastive Pretraining)

img README 4

After the model is successfully trained, we can query it with new information. (2. Zero shot)

  1. Take an input

  2. Encode with the custom trained encoders

  3. Find a match (image or text) from the known data set.

    1. Go through each entry of the data set

    2. Check similarity with the current input

    3. Output the pairs resulting in most similarity

  4. [Optionally] Measure the similarity between the real caption, and the guessed one.

Loss

The perfect relationship between encoded images and captions is described by their encoded representations being the same. This similarity can easily be measured by looking at the softmax between the dot product of the encoded inputs; a perfect encoding will yield the identity matrix.

img README 5

Tools Used

The model was trained using a curated MedPix dataset that focuses on Magnetic Resonance, Computer Tomography and X-Ray scans. ClinicalBERT was used to encode the text and ResNet50 was used for the images.

Similarity between captions was measured using Rouge, Bleu, Meteor and Cider.

Future Work

  • Add new datasets; the more datasets the model has, the better the captioning performance (bigger space from where to choose a caption/image).

Some relevant datasets:

  • IU Chest X-Ray

  • ChestX-Ray 14

  • PEIR gross

  • BCIDR

  • CheXpert

  • MIMIC-CXR

  • PadChest

  • ICLEF caption

    • Generate new captions instead of just looking them up. This will vastly improve accuracy.

Members and Acknowledgements

Achieved X by doing Y as measured by Z

Implemented a medical image captioning Deep Learning model by using the CLIP model, ResNet50 and ClinicalBERT. We obtained a 61% Rouge similarity rate on our implementation with the MedPix Dataset.

medclip's People

Contributors

mauville avatar

Stargazers

 avatar tboymz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.