Code Monkey home page Code Monkey logo

vitron's Issues

模型文件下载有问题

echo "prepared checkpoints for GLIGEN"
mkdir gligen
cd gligen
git clone https://huggingface.co/gligen/demo_ckpts_legacy
git clone https://huggingface.co/gligen/gligen-generation-text-box
git clone https://huggingface.co/gligen/gligen-generation-text-image-box
git clone https://huggingface.co/gligen/gligen-inpainting-text-box


cd ..
echo "prepared checkpoints for i2vgen-xl"
git clone https://huggingface.co/ali-vilab/i2vgen-xl



echo "prepared checkpoints for LanguageBind"
mkdir LanguageBind
cd LanguageBind
git clone https://huggingface.co/LanguageBind/LanguageBind_Video_merge
git clone https://huggingface.co/LanguageBind/LanguageBind_Image
git clone https://huggingface.co/LanguageBind/LanguageBind_Video

cd ..
echo "prepared checkpoints for OpenCLIP"
mkdir openai
cd openai
git clone https://huggingface.co/openai/clip-vit-large-patch14
git clone https://huggingface.co/openai/clip-vit-base-patch32

cd ..
echo "prepared checkpoints for stablevideo"
mkdir stablevideo
cd stablevideo

cd ..
echo "prepared checkpoints for Vitron-base"
mkdir Vitron-base
cd Vitron-base

cd ..
echo "prepared checkpoints for Vitron-lora"
mkdir Vitron-lora
cd Vitron-lora

cd ..
echo "prepared checkpoints for SEEM"
mkdir seem
cd seem

cd ..
echo "prepared checkpoints for Zeroscope"
mkdir zeroscope
cd zeroscope
git clone https://huggingface.co/cerspense/zeroscope_v2_576w

这是修改后的,但stablevideo、Vitron-base、Vitron-lora、seem文件还是缺失的,请作者百忙之中补充一下

pyproject.toml 文件有问题

我已经修改,正确如下:

[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"

[project]
name = "vitron"
version = "1.0.0"
description = "A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing"
readme = "README.md"
requires-python = ">=3.8"
classifiers = [
    "Programming Language :: Python :: 3",
    "License :: OSI Approved :: Apache Software License",
]
dependencies = [
    "torch==2.1.0", "torchvision==0.15.2",
    "transformers==4.31.0", "tokenizers>=0.12.1,<0.14", "sentencepiece==0.2.0",
    "accelerate==0.28.0", "peft==0.10.0", "bitsandbytes==0.41.0",
    "pydantic<2,>=1", "markdown2[all]", "numpy", "scikit-learn==1.2.2",
    "requests", "httpx==0.24.0", "uvicorn", "fastapi",
    "einops==0.7.0", "timm==0.9.16",
    "tensorboardX==2.6.2.2", "gradio==3.42.0", "gradio_client==0.5.0",
    "ffmpeg==1.4",
    "flash-attn==2.5.6",
#    "detectron2==0.6",
]

[project.optional-dependencies]
train = ["deepspeed==0.12.6", "ninja", "wandb"]

[project.urls]
"Homepage" = "https://github.com/SkyworkAI/Vitron"
"Bug Tracker" = "https://github.com/SkyworkAI/Vitron/issues"

[tool.setuptools.packages.find]
exclude = ["assets*", "benchmark*", "docs", "dist*", "playground*", "scripts*", "tests*"]

[tool.wheel]
exclude = ["assets*", "benchmark*", "docs", "dist*", "playground*", "scripts*", "tests*"]

"detectron2==0.6 需要注释,并运行下面命令安装:

No matching distribution found for detectron2==0.6

pip install git+https://github.com/facebookresearch/detectron2.git

还有一个问题我不是很确定,我将ffmpeg==6.1.1改为了ffmpeg==1.4有没有什么影响,暂时未发现问题:

ERROR: Could not find a version that satisfies the requirement ffmpeg==6.1.1 (from vitron[train]) (from versions: 1.1.0, 1.2.0, 1.2.1, 1.3, 1.4)
ERROR: No matching distribution found for ffmpeg==6.1.1

Image Editing

Thank you for your excellent work. In Table 2 ('Summary of backend modules in VITRON') of the paper, the image editing model used is GLIGEN. I want to know how you implemented style changing, moving, and color changing shown in Figure 1 using GLIGEN.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.