Code Monkey home page Code Monkey logo

alphaclip's Introduction

Alpha-CLIP

This repository is the official implementation of AlphaCLIP

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Zeyi Sun*, Ye Fang*, Tong Wu, Pan Zhang, Yuhang Zang, Shu Kong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang

*Equal Contribution

Demo Alpha-CLIP with Stable Diffusion: Hugging Face Spaces Open in OpenXLab

Demo Alpha-CLIP with LLaVA: coming soon

๐Ÿ“œ News

[2023/12/7] The paper and project page are released!

๐Ÿ’ก Highlights

  • ๐Ÿ”ฅ 3.93% improved zero-shot ImageNet classification accuracy when providing foreground alpha-map.
  • ๐Ÿ”ฅ Plug-in and play with region focus in any work that use CLIP vision encoder.
  • ๐Ÿ”ฅ A strong visual encoder as vasatile tool when foreground mask is available.

๐Ÿ‘จโ€๐Ÿ’ป Todo

  • Training and evaluation code for Alpha-CLIP
  • Web demo and local demo of Alpha-CLIP with LLaVA
  • Web demo and local demo of Alpha-CLIP with Stable Diffusion
  • Usage example notebook of Alpha-CLIP
  • Checkpoints of Alpha-CLIP

๐Ÿ› ๏ธ Usage

Installation

our model is based on CLIP, please first prepare environment for CLIP, then directly install Alpha-CLIP.

pip install -e .

install loralib

pip install loralib

how to use

Download model from model-zoo and place it under checkpoints.

import alpha_clip
alpha_clip.load("ViT-B/16", alpha_vision_ckpt_pth="checkpoints/clip_b16_grit1m_fultune_8xe.pth", device="cpu"), 
image_features = model.visual(image, alpha)

alpha need to be normalized via transforms when using binary_mask in (0, 1)

mask_transform = transforms.Compose([
    transforms.ToTensor(), 
    transforms.Resize((224, 224)),
    transforms.Normalize(0.5, 0.26)
])
alpha = mask_transform(binary_mask * 255)

Usage examples are available

  • Visualization of attention map: notebook
  • Alpha-CLIP used in BLIP-Diffusion: notebook
  • Alpha-CLIP used in SD_ImageVar: demo

โญ Demos

โค๏ธ Acknowledgments

  • CLIP: The codebase we built upon. Thanks for their wonderful work.
  • LAVIS: The amazing open-sourced multimodality learning codebase, where we test Alpha-CLIP in BLIP-2 and BLIP-Diffusion.
  • Point-E: Wonderful point-cloud generation model, where we test Alpha-CLIP for 3D generation task.
  • LLaVA: Wounderful MLLM that use CLIP as visual bacbone where we test the effectiveness of Alpha-CLIP.

โœ’๏ธ Citation

If you find our work helpful for your research, please consider giving a star โญ and citation ๐Ÿ“

@misc{sun2023alphaclip,
      title={Alpha-CLIP: A CLIP Model Focusing on Wherever You Want}, 
      author={Zeyi Sun and Ye Fang and Tong Wu and Pan Zhang and Yuhang Zang and Shu Kong and Yuanjun Xiong and Dahua Lin and Jiaqi Wang},
      year={2023},
      eprint={2312.03818},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License

Code License Data License Usage and License Notices: The data and checkpoint is intended and licensed for research use only. They are also restricted to uses that follow the license agreement of CLIP. The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes.

alphaclip's People

Contributors

sunzey avatar aleafy avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.