Light

markin-wang / clevit Goto Github PK

View Code? Open in Web Editor NEW

7.0 3.0 1.0 147.82 MB

[IJCAI 2023] CLE-ViT: Contrastive Learning Encoded Transformer for Ultra-Fine-Grained Visual Categorization.

License: MIT License

Python 92.95% C++ 1.18% Cuda 3.08% Shell 2.79%

fine-grained-classification fine-grained-visual-categorization image-classification

clevit's Introduction

CLE-ViT: Contrastive Learning Encoded Transformer for Ultra-Fine-Grained Visual Categorization

Official PyTorch implementation of CLE-ViT: Contrastive Learning Encoded Transformer for Ultra-Fine-Grained Visual Categorization (IJCAI 2023).

If you use the code in this repo for your work, please cite the following bib entries:

Abstract

Ultra-fine-grained visual classification (ultra-FGVC) targets at classifying sub-grained categories of fine-grained objects. This inevitably requires discriminative representation learning within a limited training set. Exploring intrinsic features from the object itself, e.g. , predicting the rotation of a given image, has demonstrated great progress towards learning discriminative representation. Yet none of these works consider explicit supervision for learning mutual information at instance level. To this end, this paper introduces CLE-ViT, a novel contrastive learning encoded transformer, to address the fundamental problem in ultra-FGVC. The core design is a self-supervised module that performs self-shuffling and masking and then distinguishes these altered images from other images. This drives the model to learn an optimized feature space that has a large inter-class distance while remaining tolerant to intra-class variations. By incorporating this self-supervised module, the network acquires more knowledge from the intrinsic structure of the input data, which improves the generalization ability without requiring extra manual annotations. CLE-ViT demonstrates strong performance on 7 publicly available datasets, demonstrating its effectiveness in the ultra-FGVC task.

Create Environment

Please use the command below to create the environment for CLE-ViT.

  $ conda env create -f env.yaml

Download Google pre-trained ViT models

Get models in this link: Swin-B, Swin-S...

wget https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window7_224_22k.pth

Dataset

You can download the datasets from the links below:

Run the experiments.

Using the scripts on scripts directory to train the model, e.g., train on SoybeanGene dataset.

$ sh scripts/run_gene.sh

Download Trained Models

Trained model BaiDuNetDisk

Password: r5zr

Acknowledgment

Our project references the codes in the following repos. Thanks for thier works and sharing.

clevit's People

Contributors

Stargazers

Watchers

Forkers

pikeyang

clevit's Issues

正样本对构建问题

作者您好：
复现代码的过程中，遇到了一些问题，希望能得到您的解答。main.py文件里面的正样本对构建有出入，正负样本是一样的图片吗？并且data_loader_train在dataloader那边是可以取到数据的，然后到了训练的时候，图像数据就是空的，标签数据还在。
并报错

拿data_loader_val的数据集去train，虽然跑不通，后面有维度问题，但是标签和这个img里面的内容是有的。具体的维度问题是：正常val时，数据维度是[B,3,448,448]，但拿val的数据train时，经过这段代码

维度会变为[B*3,448,448]。期待您的回复，谢谢！

Error loading pre-trained model

Hello, when I run sh scripts/run_gene.sh, when loading the pre-trained model, the following error occurs:

traceback : Traceback (most recent call last):
File "/home/image/anaconda3/envs/plant_diseases/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 345, in wrapper
return f(*args, **kwargs)
File "/home/image/plant_disease/CLEViT/main.py", line 94, in main
load_pretrained(config, model_without_ddp, logger)
File "/home/image/plant_disease/CLEViT/utils.py", line 206, in load_pretrained
relative_position_bias_table_current = model.state_dict()[k]
KeyError: 'layers.2.blocks.6.attn.relative_position_bias_table'

我该如何训练自己的数据集？

感谢你做出的工作，我是一个初学者，我需要如何进行对比学习预训练，如何用在我自己的细粒度视觉分类数据集上呢？是修改main.py中的参数is_pretrain吗？

Questions about code reproduction

Hi Author, We are having the following problem with the reproduction process.

scripts/run_gene.sh: line 2: logs/gene/gene_sb7_fullb_np2_lr2e-3_mul5_bs12_448_ep200_linear_ee2: No such file or directory
tail: cannot open 'logs/gene/gene_sb7_fullb_np2_lr2e-3_mul5_bs12_448_ep200_linear_ee2' for reading: No such file or directory
tail: no files remaining

Where Paper?

: )

How to visualize the attention heat map?

Hi, I appreciate your work very much. Can you open source the attention visualization part of the code?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.