Visual Classification via Description from Large Language Models

Sachit Menon, Carl Vondrick

ICLR 2023, Notable Top 5% (Oral)

Approach

The standard vision-and-language model compares image embeddings (white dot) to word embeddings of the category name (colorful dots) in order to perform classification, as illustrated in (a). We instead query large language models to automatically build descriptors, and perform recognition by comparing to the category descriptors, as shown in (b).

Usage

First install the dependencies.

Either manually:

conda install pytorch torchvision -c pytorch
conda install matplotlib torchmetrics -c conda-forge
pip install git+https://github.com/openai/CLIP.git
pip install git+https://github.com/modestyachts/ImageNetV2_pytorch

Or using the provided .yml file.

conda env create -f classbydesc.yml

To reproduce accuracy results from the paper: edit the directories to match your local machine in load.py and set hparams['dataset'] accordingly. Then simply run python main.py.

All hyperparameters can be modified in load.py.

To generate example decisions and explanations as well as contrast from the CLIP decision, use the show_from_indices function in load.py after having run main.py. Details forthcoming.

Example displaying the predictions that differ between baseline CLIP and our method:

show_from_indices(torch.where(descr_predictions != clip_predictions)[0], images, labels, descr_predictions, clip_predictions, image_description_similarity=image_description_similarity, image_labels_similarity=image_labels_similarity)

Example outputs:

Generating Your Own Descriptors

See generate_descriptors.py. If you have a list of classes, you can pass it to the obtain_descriptors_and_save function to save a json of descriptors. (You will need to add your OpenAI API token.)

Citation

@article{menon2022visual,
  author    = {Menon, Sachit and Vondrick, Carl},
  title     = {Visual Classification via Description from Large Language Models},
  journal   = {ICLR},
  year      = {2023},
}

wenyongli / classify_by_description_release Goto Github PK

classify_by_description_release's Introduction

Visual Classification via Description from Large Language Models

Sachit Menon, Carl Vondrick

ICLR 2023, Notable Top 5% (Oral)

Approach

Usage

Generating Your Own Descriptors

Citation

classify_by_description_release's People

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent