Code Monkey home page Code Monkey logo

vitron's Introduction

VITRON: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

Hao Fei$^{1,2}$, Shengqiong Wu$^{1,2}$, Hanwang Zhang$^{1,3}$, Tat-Seng Chua$^{2}$, Shuicheng Yan$^{1}$

$^{1}$ Skywork AI, Singapore ▶ $^{2}$ National University of Singapore ▶ $^{3}$ Nanyang Technological University

License YouTube

📰 News

  • [2024.04.04] 👀👀👀 Our Vitron is available now! Welcome to watch 👀 this repository for the latest updates.

😮 Highlights

Existing vision LLMs might still encounter challenges such as superficial instance-level understanding, lack of unified support for both images and videos, and insufficient coverage across various vision tasks. To fill the gaps, we present Vitron, a universal pixel-level vision LLM, designed for comprehensive understanding (perceiving and reasoning), generating, segmenting (grounding and tracking), editing (inpainting) of both static image and dynamic video content.

vitron

🛠️ Requirements and Installation

  • Python >= 3.8
  • Pytorch == 2.1.0
  • CUDA Version >= 11.8
  • Install required packages:
git clone https://github.com/SkyworkAI/Vitron
cd Vitron
conda create -n vitron python=3.10 -y
conda activate vitron
pip install --upgrade pip 
pip install -e .
pip install -e ".[train]"
pip install flash-attn --no-build-isolation
pip install decord opencv-python git+https://github.com/facebookresearch/pytorchvideo.git@28fe037d212663c6a24f373b94cc5d478c8c1a1d
🔥🔥🔥 Installation or Running Fails? 🔥🔥🔥
  1. When running ffmpeg, Unknown encoder 'x264':

    • try to re-install ffmpeg:
    conda uninstall ffmpeg
    conda install -c conda-forge ffmpeg   # `-c conda-forge` can not omit
    
  2. Fail to install detectron2, try this command:

    python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
    

    or refer this Website.

  3. Error in gradio. As there are a big update in gradio>=4.0.0, please make sure install gradio with the same verion in requirements.txt.

  4. Error with deepspeed. If you fine-tune your model, this error occours:

    FAILED: cpu_adam.so
    /usr/bin/ld: cannot find -lcurand
    

    This error is caused by the wrong soft links when installing deepspeed. Please try to the following command to solve the error:

    cd ~/miniconda3/envs/vitron/lib
    ls -al libcurand*  # check the links
    rm libcurand.so   # remove the wrong links
    ln -s libcurand.so.10.3.5.119 libcurand.so  # build new links
    

    Double check again:

    python 
    from deepspeed.ops.op_builder import CPUAdamBuilder
    ds_opt_adam = CPUAdamBuilder().load()  # if loading successfully, then deepspeed are installed successfully.
    

👍 Deploying Gradio Demo

  • Firstly, you need to prepare the checkpoint, and then you can run the demo locally via:
python app.py

🙌 Related Projects

You may refer to related work that serves as foundations for our framework and code repository, Vicuna, SEEM, i2vgenxl, StableVideo, and Zeroscope. We also partially draw inspirations from Video-LLaVA, and LanguageBind. Thanks for their wonderful works.

🔒 License

  • The majority of this project is released under the Apache 2.0 license as found in the LICENSE file.
  • The service is a research preview intended for non-commercial use only, subject to the model License of LLaMA, Terms of Use of the data generated by OpenAI, and Privacy Practices of ShareGPT. Please contact us if you find any potential violation.

✏️ Citation

If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝.

@articles{hao2024vitron,
  title={Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing},
  author={Hao Fei, Shengqiong Wu, Hanwang Zhang, Tat-Seng Chua, Shuicheng Yan},
  journal={CoRR},
  year={2024}
}

✨ Star History

Star History

vitron's People

Contributors

chocowu avatar scofield7419 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vitron's Issues

模型文件下载有问题

echo "prepared checkpoints for GLIGEN"
mkdir gligen
cd gligen
git clone https://huggingface.co/gligen/demo_ckpts_legacy
git clone https://huggingface.co/gligen/gligen-generation-text-box
git clone https://huggingface.co/gligen/gligen-generation-text-image-box
git clone https://huggingface.co/gligen/gligen-inpainting-text-box


cd ..
echo "prepared checkpoints for i2vgen-xl"
git clone https://huggingface.co/ali-vilab/i2vgen-xl



echo "prepared checkpoints for LanguageBind"
mkdir LanguageBind
cd LanguageBind
git clone https://huggingface.co/LanguageBind/LanguageBind_Video_merge
git clone https://huggingface.co/LanguageBind/LanguageBind_Image
git clone https://huggingface.co/LanguageBind/LanguageBind_Video

cd ..
echo "prepared checkpoints for OpenCLIP"
mkdir openai
cd openai
git clone https://huggingface.co/openai/clip-vit-large-patch14
git clone https://huggingface.co/openai/clip-vit-base-patch32

cd ..
echo "prepared checkpoints for stablevideo"
mkdir stablevideo
cd stablevideo

cd ..
echo "prepared checkpoints for Vitron-base"
mkdir Vitron-base
cd Vitron-base

cd ..
echo "prepared checkpoints for Vitron-lora"
mkdir Vitron-lora
cd Vitron-lora

cd ..
echo "prepared checkpoints for SEEM"
mkdir seem
cd seem

cd ..
echo "prepared checkpoints for Zeroscope"
mkdir zeroscope
cd zeroscope
git clone https://huggingface.co/cerspense/zeroscope_v2_576w

这是修改后的,但stablevideo、Vitron-base、Vitron-lora、seem文件还是缺失的,请作者百忙之中补充一下

Image Editing

Thank you for your excellent work. In Table 2 ('Summary of backend modules in VITRON') of the paper, the image editing model used is GLIGEN. I want to know how you implemented style changing, moving, and color changing shown in Figure 1 using GLIGEN.

pyproject.toml 文件有问题

我已经修改,正确如下:

[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"

[project]
name = "vitron"
version = "1.0.0"
description = "A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing"
readme = "README.md"
requires-python = ">=3.8"
classifiers = [
    "Programming Language :: Python :: 3",
    "License :: OSI Approved :: Apache Software License",
]
dependencies = [
    "torch==2.1.0", "torchvision==0.15.2",
    "transformers==4.31.0", "tokenizers>=0.12.1,<0.14", "sentencepiece==0.2.0",
    "accelerate==0.28.0", "peft==0.10.0", "bitsandbytes==0.41.0",
    "pydantic<2,>=1", "markdown2[all]", "numpy", "scikit-learn==1.2.2",
    "requests", "httpx==0.24.0", "uvicorn", "fastapi",
    "einops==0.7.0", "timm==0.9.16",
    "tensorboardX==2.6.2.2", "gradio==3.42.0", "gradio_client==0.5.0",
    "ffmpeg==1.4",
    "flash-attn==2.5.6",
#    "detectron2==0.6",
]

[project.optional-dependencies]
train = ["deepspeed==0.12.6", "ninja", "wandb"]

[project.urls]
"Homepage" = "https://github.com/SkyworkAI/Vitron"
"Bug Tracker" = "https://github.com/SkyworkAI/Vitron/issues"

[tool.setuptools.packages.find]
exclude = ["assets*", "benchmark*", "docs", "dist*", "playground*", "scripts*", "tests*"]

[tool.wheel]
exclude = ["assets*", "benchmark*", "docs", "dist*", "playground*", "scripts*", "tests*"]

"detectron2==0.6 需要注释,并运行下面命令安装:

No matching distribution found for detectron2==0.6

pip install git+https://github.com/facebookresearch/detectron2.git

还有一个问题我不是很确定,我将ffmpeg==6.1.1改为了ffmpeg==1.4有没有什么影响,暂时未发现问题:

ERROR: Could not find a version that satisfies the requirement ffmpeg==6.1.1 (from vitron[train]) (from versions: 1.1.0, 1.2.0, 1.2.1, 1.3, 1.4)
ERROR: No matching distribution found for ffmpeg==6.1.1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.