zheng-chong / catvton Goto Github PK

CatVTON is a simple and efficient virtual try-on diffusion model with 1) Lightweight Network (899.06M parameters totally), 2) Parameter-Efficient Training (49.57M parameters trainable) and 3) Simplified Inference (< 8G VRAM for 1024X768 resolution).

License: Other

Python 90.23% Cuda 3.37% C++ 2.41% HTML 0.49% CSS 0.07% JavaScript 3.43%

diffusion-models fashion try-on

catvton's Introduction

🐈 CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models

CatVTON is a simple and efficient virtual try-on diffusion model with 1) Lightweight Network (899.06M parameters totally), 2) Parameter-Efficient Training (49.57M parameters trainable) and 3) Simplified Inference (< 8G VRAM for 1024X768 resolution).

Updates

2024/08/13: We localize DensePose & SCHP to avoid certain environment issues.
2024/08/10: Our 🤗 HuggingFace Space is available now! Thanks for the grant from ZeroGPU！
2024/08/09: Evaluation code is provided to calculate metrics 📚.
2024/07/27: We provide code and workflow for deploying CatVTON on ComfyUI 💥.
2024/07/24: Our Paper on ArXiv is available 🥳!
2024/07/22: Our App Code is released, deploy and enjoy CatVTON on your mechine 🎉!
2024/07/21: Our Inference Code and Weights 🤗 are released.
2024/07/11: Our Online Demo is released 😁.

Installation

Create a conda environment & Install requirments

conda create -n catvton python==3.9.0
conda activate catvton
cd CatVTON-main  # or your path to CatVTON project dir
pip install -r requirements.txt

Deployment

ComfyUI Workflow

We have modified the main code to enable easy deployment of CatVTON on ComfyUI. Due to the incompatibility of the code structure, we have released this part in the Releases, which includes the code placed under custom_nodes of ComfyUI and our workflow JSON files.

To deploy CatVTON to your ComfyUI, follow these steps:

Install all the requirements for both CatVTON and ComfyUI, refer to Installation Guide for CatVTON and Installation Guide for ComfyUI.
Download ComfyUI-CatVTON.zip and unzip it in the custom_nodes folder under your ComfyUI project (clone from ComfyUI).
Run the ComfyUI.
Download catvton_workflow.json and drag it into you ComfyUI webpage and enjoy 😆!

Problems under Windows OS, please refer to issue#8.

When you run the CatVTON workflow for the first time, the weight files will be automatically downloaded, usually taking dozens of minutes.

Gradio App

To deploy the Gradio App for CatVTON on your machine, run the following command, and checkpoints will be automatically downloaded from HuggingFace.

CUDA_VISIBLE_DEVICES=0 python app.py \
--output_dir="resource/demo/output" \
--mixed_precision="bf16" \
--allow_tf32

When using bf16 precision, generating results with a resolution of 1024x768 only requires about 8G VRAM.

Inference

1. Data Preparation

Before inference, you need to download the VITON-HD or DressCode dataset. Once the datasets are downloaded, the folder structures should look like these:

├── VITON-HD
|   ├── test_pairs_unpaired.txt
│   ├── test
|   |   ├── image
│   │   │   ├── [000006_00.jpg | 000008_00.jpg | ...]
│   │   ├── cloth
│   │   │   ├── [000006_00.jpg | 000008_00.jpg | ...]
│   │   ├── agnostic-mask
│   │   │   ├── [000006_00_mask.png | 000008_00.png | ...]
...

├── DressCode
|   ├── test_pairs_paired.txt
|   ├── test_pairs_unpaired.txt
│   ├── [dresses | lower_body | upper_body]
|   |   ├── test_pairs_paired.txt
|   |   ├── test_pairs_unpaired.txt
│   │   ├── images
│   │   │   ├── [013563_0.jpg | 013563_1.jpg | 013564_0.jpg | 013564_1.jpg | ...]
│   │   ├── agnostic_masks
│   │   │   ├── [013563_0.png| 013564_0.png | ...]
...

For the DressCode dataset, we provide script to preprocessed agnostic masks, run the following command:

CUDA_VISIBLE_DEVICES=0 python preprocess_agnostic_mask.py \
--data_root_path <your_path_to_DressCode>

2. Inference on VTIONHD/DressCode

To run the inference on the DressCode or VITON-HD dataset, run the following command, checkpoints will be automatically downloaded from HuggingFace.

CUDA_VISIBLE_DEVICES=0 python inference.py \
--dataset [dresscode | vitonhd] \
--data_root_path <path> \
--output_dir <path> 
--dataloader_num_workers 8 \
--batch_size 8 \
--seed 555 \
--mixed_precision [no | fp16 | bf16] \
--allow_tf32 \
--repaint \
--eval_pair

3. Calculate Metrics

After obtaining the inference results, calculate the metrics using the following command:

CUDA_VISIBLE_DEVICES=0 python eval.py \
--gt_folder <your_path_to_gt_image_folder> \
--pred_folder <your_path_to_predicted_image_folder> \
--paired \
--batch_size=16 \
--num_workers=16

--gt_folder and --pred_folder should be folders that contain only images.
To evaluate the results in a paired setting, use --paired; for an unpaired setting, simply omit it.
--batch_size and --num_workers should be adjusted based on your machine.

Acknowledgement

Our code is modified based on Diffusers. We adopt Stable Diffusion v1.5 inpainting as the base model. We use SCHP and DensePose to automatically generate masks in our Gradio App and ComfyUI workflow. Thanks to all the contributors!

License

All the materials, including code, checkpoints, and demo, are made available under the Creative Commons BY-NC-SA 4.0 license. You are free to copy, redistribute, remix, transform, and build upon the project for non-commercial purposes, as long as you give appropriate credit and distribute your contributions under the same license.

Citation

@misc{chong2024catvtonconcatenationneedvirtual,
 title={CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models}, 
 author={Zheng Chong and Xiao Dong and Haoxiang Li and Shiyue Zhang and Wenqing Zhang and Xujie Zhang and Hanqing Zhao and Xiaodan Liang},
 year={2024},
 eprint={2407.15886},
 archivePrefix={arXiv},
 primaryClass={cs.CV},
 url={https://arxiv.org/abs/2407.15886}, 
}

catvton's People

Contributors

Stargazers

Watchers

Forkers

slimevrx raytang88 dx777 ducminhkhoi chg0901 hamdiboukamcha utopic-dev xiguadaddy 306026185 xjspace lijingle-coder cloudenginehub brunopaivatoptal paperwave faruqhrp hubin858130 metainfinitygamer rajat5972 ducbxfsoft bqdove eltociear gimmyalex sapienbotics mbaroudi saddambinsyed nomiscientist theblackhatmagician xxsuper saharsha-n nhatipoglu hiroforyou tsok-xyz babaswananda lumisai adiravishankara kaidduong lizhongguo xiaolin3325 vigneshkaarnik fjjjczyz flyingstar111 aimdreamboy buiduchanh guonetnet51 arunjayakumar01 cabelo chediak trikim byhamzahwijaya thanhpham1987 chong5u sanjay8602 ehsanahmadkhan525 varaprasadh malo94 g711ab maolei133 rodrigogalhardo sckim0430 ashwinrajendraprasad kyle-youn okinter11 thaithanhdhv aidenli sureshjeyachandran lalroshan590 quyqp1505 ai-learn-use wengyi1206

catvton's Issues

Hello sir how can I have it's commercial licence, for the web application

Dependencies issues

Hi, it would be great to try out such project. However the requirements.txt is a bit messed up. Lot's of broken or missing dependencies.
For example, densepose module is nowhere to be found as pip package, as well as detectron2 (this one I installed from git repo).
Can you please do a clean check on your requirements.txt and maybe update readme with an installation section?

Thanks.

安装指南打开是404

RT，请大佬看看是不是有问题

A modified CatVTON ComfyUI node (optimized for Windows)

Thanks for Autor's hard work, based on original project, I modified a new CatVTON Comfyui node and release on https://github.com/pzc163/Comfyui-CatVTON for more user friendly with Win OS user. And also i post a short blog on https://x.com/yangzhizheng1/status/1818197568293388578 for more and more people notice our work.

VITON-HD inference needs not existing file

Hello and thank you for your work!

I've tried to run an inference on VITON-HD that I've downloaded locally. And while inference I see this error:

Traceback (most recent call last):
File "/home/cuda/CatVTON/inference.py", line 325, in
main()
File "/home/cuda/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/cuda/CatVTON/inference.py", line 269, in main
dataset = VITONHDTestDataset(args)
File "/home/cuda/CatVTON/inference.py", line 18, in init
self.data = self.load_data()
File "/home/cuda/CatVTON/inference.py", line 39, in load_data
assert os.path.exists(pair_txt:=os.path.join(self.args.data_root_path, 'test_pairs_unpaired.txt')), f"File {pair_txt} does not exist."
AssertionError: File /home/cuda/zalando-hd-resized/test_pairs_unpaired.txt does not exist.

I've downloaded it again and rechecked file structure, but there is no test_pairs_unpaired.txt file in VITON-HD dataset. How could I avoud this problem?

Details about training setting

Good work for the design of such simple vton pipeline.

I have tried to train CatVTON on vitonhd dataset, but the result is a little blurry as shown below. (38k iteration batchsize 8x32 512x384 resolution input, only attention parameters are trained)

I'm wondering is there any specific setting or trick in the loss part, for example how to compute the loss? (i.e. compute loss of latents of human images or the concat latents. )

I also noticed the training loss is relatively small at the beginning of the training, is this normal?

Epoch 0, step 0, step_loss: 0.06322, data_time: 2.104, time: 4.421
Epoch 0, step 1, step_loss: 0.04681, data_time: 0.058, time: 2.126
Epoch 0, step 2, step_loss: 0.06814, data_time: 0.058, time: 2.124
Epoch 0, step 3, step_loss: 0.03120, data_time: 0.064, time: 2.139
Epoch 0, step 4, step_loss: 0.02966, data_time: 0.059, time: 2.132
Epoch 0, step 5, step_loss: 0.03977, data_time: 0.059, time: 2.132
Epoch 0, step 6, step_loss: 0.05645, data_time: 0.059, time: 2.133

Comfyui installation is not fully adapted

The project author did not fully adapt comfyui. When I used manager to install the missing nodes, it prompted No Results.
When I download the zip archive and json workflow in release, I get an error when launching comfyui.
When I opened the installation guide, I found that the installation steps it described could not be followed under the Windows operating system.
https://github.com/Zheng-Chong/CatVTON/blob/main/INSTALL.md

FileNotFoundError: [Errno 2] No such file or directory: '/DressCode/upper_body/agnostic_masks/048402_0.png' Description:

ERROR：FileNotFoundError: [Errno 2] No such file or directory: '/DressCode/upper_body/agnostic_masks/048402_0.png'

I found that the provided our preprocessed agnostic masks contains fewer than 1800 masks (upper/lower/dresses are all affected), which causes this error. How can this be resolved in such a case?

thanks!!

Potential NSFW content was detected in one or more images.

when i run app.py, i will get this error: A black image will be returned instead. Try again with a different prompt and/or seed.

Error occurred when I tried to change input width and height!

Hi! Thanks for your great work!

When I don't give any specific width and height value, It works perfectly
But when I tried to change these width and height, the code runs successfully without any warnings or errors, but the result that I've got is just black empty image. (below image)
I think resolutions were changed into given options.

Thanks!

for video

is it possible to use for video?

Will you open source code and weights? What license is used for open source

训练结果不同gs的异常

图像由左到右分别是服装、cloth guidance scale分别为1.0、1.5、2.0、2.5的生成图。

gs从小到大变化，服装细节逐渐可控，但重绘区域变暗变黑。整体图像光感正常的gs时，服装细节又控不了。不知大佬之前训练时有没有碰到这个情况，或者知道可能是什么原因导致的？

comfyui 3 compression packages, which one is? thank you

SCHP unable to run on CPU only environment

Thanks for the great work.
I am encountering a problem when running the script in a CPU-only environment (Colab with no GPU). Below are the error details:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
[<ipython-input-22-144066cfdfe5>](https://localhost:8080/#) in <cell line: 5>()
      3 
      4 from utils import resize_and_crop
----> 5 from model.cloth_masker import AutoMasker as AM

9 frames
[/content/CatVTON/model/cloth_masker.py](https://localhost:8080/#) in <module>
      7 import torch
      8 
----> 9 from model.SCHP import SCHP  # type: ignore
     10 from model.DensePose import DensePose  # type: ignore
     11 

[/content/CatVTON/model/SCHP/__init__.py](https://localhost:8080/#) in <module>
----> 1 from model.SCHP import networks
      2 from model.SCHP.utils.transforms import get_affine_transform, transform_logits
      3 
      4 from collections import OrderedDict
      5 import torch

[/content/CatVTON/model/SCHP/networks/__init__.py](https://localhost:8080/#) in <module>
      1 from __future__ import absolute_import
      2 
----> 3 from model.SCHP.networks.AugmentCE2P import resnet101
      4 
      5 __factory = {

[/content/CatVTON/model/SCHP/networks/AugmentCE2P.py](https://localhost:8080/#) in <module>
     19 # Note here we adopt the InplaceABNSync implementation from https://github.com/mapillary/inplace_abn
     20 # By default, the InplaceABNSync module contains a BatchNorm Layer and a LeakyReLu layer
---> 21 from model.SCHP.modules import InPlaceABNSync
     22 
     23 BatchNorm2d = functools.partial(InPlaceABNSync, activation='none')

[/content/CatVTON/model/SCHP/modules/__init__.py](https://localhost:8080/#) in <module>
----> 1 from .bn import ABN, InPlaceABN, InPlaceABNSync
      2 from .functions import ACT_RELU, ACT_LEAKY_RELU, ACT_ELU, ACT_NONE
      3 from .misc import GlobalAvgPool2d, SingleGPU
      4 from .residual import IdentityResidualBlock
      5 from .dense import DenseModule

[/content/CatVTON/model/SCHP/modules/bn.py](https://localhost:8080/#) in <module>
      8     from Queue import Queue
      9 
---> 10 from .functions import *
     11 
     12 

[/content/CatVTON/model/SCHP/modules/functions.py](https://localhost:8080/#) in <module>
      8 
      9 _src_path = path.join(path.dirname(path.abspath(__file__)), "src")
---> 10 _backend = load(name="inplace_abn",
     11                 extra_cflags=["-O3"],
     12                 sources=[path.join(_src_path, f) for f in [

[/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py](https://localhost:8080/#) in load(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module, is_standalone, keep_intermediates)
   1307         ...     verbose=True)
   1308     """
-> 1309     return _jit_compile(
   1310         name,
   1311         [sources] if isinstance(sources, str) else sources,

[/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py](https://localhost:8080/#) in _jit_compile(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module, is_standalone, keep_intermediates)
   1743         return _get_exec_path(name, build_directory)
   1744 
-> 1745     return _import_module_from_library(name, build_directory, is_python_module)
   1746 
   1747 

[/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py](https://localhost:8080/#) in _import_module_from_library(module_name, path, is_python_module)
   2141         spec = importlib.util.spec_from_file_location(module_name, filepath)
   2142         assert spec is not None
-> 2143         module = importlib.util.module_from_spec(spec)
   2144         assert isinstance(spec.loader, importlib.abc.Loader)
   2145         spec.loader.exec_module(module)

ImportError: /tmp/inplace_abn/inplace_abn.so: cannot open shared object file: No such file or directory

I believe the issue arises because some dependencies of SCHP require CUDA and are only available in a CUDA environment.
By the way, I have set export TORCH_EXTENSIONS_DIR=/tmp to overcome another issue, so you might see import errors from /tmp.
Do you have a solution to run SCHP in a CPU-only environment?

Love your work!

Thanks for sharing this work! Just let you know that I really love the simplicity and effectiveness of this model! Cheers!

xformers is not compatible with MacOS

Hey, I just wonder how to fix the compatibility issue with MacOS. Can't install the requirements file because the xformers is not compatible with MacOS.

VITON-HD results

Thank you for your great work on CatVTON!

I tested the VITON-HD model and generated 512x384 images.
I resized the ground truth to 512x384 and tested SSIM and FID, and found that the metrics are SSIM=0.856 and FID=8.63. This does not match the metrics in the paper.
So, were the metrics in the paper obtained using a "mix model", rather than just the model trained on VITON-HD?

Request for Training Code

Hello! This is great work. Hats off to you and your team. I would love to re-implement the results with training on my personal machine. I was wondering if there are plans to release the training code?

Comfyui无法加载节点

When loading the graph, the following node types were not found:
LoadAutoMasker
CatVTON
AutoMasker
LoadCatVTONPipeline

Is that DensePose must be installed for deploy in ComfyUI？

OS:win10
python：3.10
torch:2.1.2
cuda:12.1

when i deploy this node in comyui, it has come below import error,it seems like DensePose is missing. so does that DensePose have to be installed？

agnostic masks

Very good work, if I want to test my own models, how can I make agnostic masks?

Host the demo on Huggingface Spaces ZeroGPU

Hi @Zheng-Chong Congratulations on CatVTON release! Would be great to have the demo up on Huggingface Spaces. We provide GPU grants for interesting projects and paper-implementations, and would be happy to support CatVTON with ZeroGPU (A100s) sponsorship!

You might need to modify the current gradio code for ZeroGPU Spaces usage, actually. To understand this better, please refer to the usage section of the organization: https://huggingface.co/zero-gpu-explorers.

We also have a step-by-step guide for using the gradio sdk on Spaces: https://huggingface.co/docs/hub/en/spaces-sdks-gradio.

Applying for grants on Spaces is fairly easy using the Settings tab of your Space. For more information on how to apply for GPU grants on Spaces, please visit: https://huggingface.co/docs/hub/en/spaces-gpus#community-gpu-grants."

How to finetune this model?

Is it trainable ? Could you please tell whether i can finetune this model to fix personal use?

Environment issues on Windows 11

Hello,

I was trying to install the repo locally on my PC, however, I was unable to perform the last step: pip install git+https://github.com/facebookresearch/detectron2@main#subdirectory=projects/DensePose
as it is giving me an error in red: error: could not build wheels for detectron2, which is required to install pyproject.toml-based projects
I've been told to install VS Build Tools but that didn't help.

Thanks in advance

缺少SCHP.py

[Prompt Server] web root: D:\AI\ComfyUI\web_custom_versions\Comfy-Org_ComfyUI_frontend\1.2.27
Skip D:\AI\ComfyUI\custom_nodes\CatVTON module for custom nodes due to the lack of NODE_CLASS_MAPPINGS.
Adding D:\AI\ComfyUI\custom_nodes to sys.path
Could not find efficiency nodes

ModuleNotFoundError: No module named 'cv2'

(Catvton) C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable>C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\run_nvidia_gpu.bat

(Catvton) C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable>.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build
[START] Security scan
[DONE] Security scan

ComfyUI-Manager: installing dependencies done.

** ComfyUI startup time: 2024-08-02 18:17:57.433249
** Platform: Windows
** Python version: 3.11.9 (tags/v3.11.9:de54cf5, Apr 2 2024, 10:12:12) [MSC v.1938 64 bit (AMD64)]
** Python executable: C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\python_embeded\python.exe
** ComfyUI Path: C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI
** Log path: C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\comfyui.log

Prestartup times for custom nodes:
1.1 seconds: C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager

Total VRAM 8192 MB, total RAM 32632 MB
pytorch version: 2.3.1+cu121
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 2070 SUPER : cudaMallocAsync
Using pytorch cross attention
[Prompt Server] web root: C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\web
Traceback (most recent call last):
File "C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1941, in load_custom_node
module_spec.loader.exec_module(module)
File "", line 940, in exec_module
File "", line 241, in call_with_frames_removed
File "C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-CatVTON_init.py", line 3, in
from .model.cloth_masker import AutoMasker as AM
File "C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-CatVTON\model\cloth_masker.py", line 5, in
import cv2
ModuleNotFoundError: No module named 'cv2'

Cannot import C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-CatVTON module for custom nodes: No module named 'cv2'

Loading: ComfyUI-Manager (V2.48.4)

ComfyUI Revision: 2445 [369f459b] | Released on '2024-08-01'

Import times for custom nodes:
0.0 seconds: C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\custom_nodes\websocket_image_save.py 0.0 seconds (IMPORT FAILED): C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-CatVTON
0.3 seconds: C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager

Starting server

there is an problem about cv2 import, but i can correctly import cv2 in my virtual environment,could u help me with this issue?

在Windows上通过Gradio使用的本地部署教程！

非常感谢[Zheng-Chong]大佬的工作，项目的效果非常出色。我在Windows上使用Gradio的本地部署过程中，遇到了不少问题，包括 #12 的问题，后来通过查阅资料和不断尝试，最终部署成功了。
我出了一期教程，希望能对其他朋友们有所帮助。再次感谢[Zheng-Chong]大佬的工作和开源精神，点赞！！

Windows本地部署教程：
https://www.bilibili.com/video/BV173YueAEdi/?vd_source=6c8b8679b818b05d24c65f49a65eb994

limitation of CatVTON & training code request

Dear authors,

It is great that you just made Diffusion-based VTON models much simpler and lightweight. It is quite intuitive to use only self-attention. I noticed that your model can mostly preserve the structure of the garment but for some examples, it can not really model simple textures and also it can change the color of the garment quite vastly. I think these limitations mostly come from the lack of training samples in the input space. Therefore it would be quite useful if you could share the training code to address this limitation of CatVTON.

Problems when Face Printed on clothes

it seems to have some problems when the clothes is printed with a HEAD or FACE , it distorted easily~

Evaluation

Thanks for open-sourcing this work! I have a concern about the quantitative results reported in the paper. I used the vitonhd-16k-512 checkpoint to evaluate on VITON-HD, but the results did not match those reported in the paper, specifically I got LPIPS=0.1019, SSIM=0.8649, FID=13.5417(unpair), KID=6.748(unpair), which is relatively low compared to the paper report.

OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

Hello, I am using macOS with silicone chip. I followed installing guide in both comfyui and catvton. but i am getting this error OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

due to this custom nodes import fails.

How can I solve it?

Getting cutlassF on T4

RuntimeError: cutlassF: no kernel found to launch!