Code Monkey home page Code Monkey logo

sd-dino's Introduction

A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence

A Tale of Two Features explores the complementary nature of Stable Diffusion (SD) and DINOv2 features for zero-shot semantic correspondence. The results demonstrate that a simple fusion of the two features leads to state-of-the-art performance on the SPair-71k, PF-Pascal, and TSS datasets.

This repository is the official implementation of the paper:

A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence Junyi Zhang, Charles Herrmann, Junhwa Hur, Luisa F. Polanía, Varun Jampani, Deqing Sun, Ming-Hsuan Yang arXiv preprint, 2023.

teaser

Visual Results

Dense Correspondence

Object Swapping

Object Swapping (with refinement process)

Links

Environment Setup

To install the required dependencies, use the following commands:

conda create -n sd-dino python=3.9
conda activate sd-dino
conda install pytorch=1.13.1 torchvision=0.14.1 pytorch-cuda=11.6 -c pytorch -c nvidia
conda install -c "nvidia/label/cuda-11.6.1" libcusolver-dev
git clone [email protected]:Junyi42/sd-dino.git 
cd sd-dino
pip install -e .

(Optional) You may also want to install xformers for efficient transformer implementation:

pip install xformers==0.0.16

Get Started

Prepare the data

We provide the scripts to download the datasets in the data folder. To download specific datasets, use the following commands:

  • SPair-71k:
bash data/prepare_spair.sh
  • PF-Pascal:
bash data/prepare_pfpascal.sh
  • TSS:
bash data/prepare_tss.sh

Evaluate the PCK Results of SPair-71k

Run pck_spair_pascal.py file:

python pck_spair_pascal.py --SAMPLE 20

Note that the SAMPLE is the number of sampled pairs for each category, which is set to 20 by default. Set to 0 to use all the samples (settings in the paper).

Additional important parameters in pck_spair_pascal.py include:

  • --NOT_FUSE: if set to True, only use the SD feature.
  • --ONLY_DINO: if set to True, only use the DINO feature.
  • --DRAW_DENSE: if set to True, draw the dense correspondence map.
  • --DRAW_SWAP: if set to True, draw the object swapping result.
  • --DRAW_GIF: if set to True, draw the object swapping result as a gif.
  • --TOTAL_SAVE_RESULT: number of samples to save the qualitative results, set to 0 to disable and accelerate the evaluation process.

Please refer to the pck_spair_pascal.py file for more details. You may find samples of qualitative results in the results_spair folder.

Evaluate the PCK Results of PF-Pascal

Run pck_spair_pascal.py file:

python pck_spair_pascal.py --PASCAL

You may find samples of qualitative results in the results_pascal folder.

Evaluate the PCK Results of TSS

Run pck_tss.py file:

python pck_tss.py

You may find samples of qualitative results in the results_tss folder.

Demo

PCA / K-means Visualization of the Features

To extract the fused features of the input pair images and visualize the correspondence, please check the notebook demo_vis_features.ipynb for more details.

Quick Try on the Object Swapping

To swap the objects in the input pair images, please check the notebook demo_swap.ipynb for more details.

Refine the Result

TODO

Citation

If you find our work useful, please cite:

@article{zhang2023tale,
  title={{A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence}},
  author={Zhang, Junyi and Herrmann, Charles and Hur, Junhwa and Cabrera, Luisa Polania and Jampani, Varun and Sun, Deqing and Yang, Ming-Hsuan},
  journal={arXiv preprint arxiv:2305.15347},
  year={2023}
}

Acknowledgement

Our code is largely based on the following open-source projects: ODISE, dino-vit-features (official implementation), dino-vit-features (Kamal Gupta's implementation), DenseMatching, and ncnet. Our heartfelt gratitude goes to the developers of these resources!

sd-dino's People

Contributors

junyi42 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.