Code Monkey home page Code Monkey logo

catvton's Introduction

🐈 CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models

CatVTON is a simple and efficient virtual try-on diffusion model with 1) Lightweight Network (899.06M parameters totally), 2) Parameter-Efficient Training (49.57M parameters trainable) and 3) Simplified Inference (< 8G VRAM for 1024X768 resolution).

Updates

  • 2024/08/13: We localize DensePose & SCHP to avoid certain environment issues.
  • 2024/08/10: Our 🤗 HuggingFace Space is available now! Thanks for the grant from ZeroGPU
  • 2024/08/09: Evaluation code is provided to calculate metrics 📚.
  • 2024/07/27: We provide code and workflow for deploying CatVTON on ComfyUI 💥.
  • 2024/07/24: Our Paper on ArXiv is available 🥳!
  • 2024/07/22: Our App Code is released, deploy and enjoy CatVTON on your mechine 🎉!
  • 2024/07/21: Our Inference Code and Weights 🤗 are released.
  • 2024/07/11: Our Online Demo is released 😁.

Installation

Create a conda environment & Install requirments

conda create -n catvton python==3.9.0
conda activate catvton
cd CatVTON-main  # or your path to CatVTON project dir
pip install -r requirements.txt

Deployment

ComfyUI Workflow

We have modified the main code to enable easy deployment of CatVTON on ComfyUI. Due to the incompatibility of the code structure, we have released this part in the Releases, which includes the code placed under custom_nodes of ComfyUI and our workflow JSON files.

To deploy CatVTON to your ComfyUI, follow these steps:

  1. Install all the requirements for both CatVTON and ComfyUI, refer to Installation Guide for CatVTON and Installation Guide for ComfyUI.
  2. Download ComfyUI-CatVTON.zip and unzip it in the custom_nodes folder under your ComfyUI project (clone from ComfyUI).
  3. Run the ComfyUI.
  4. Download catvton_workflow.json and drag it into you ComfyUI webpage and enjoy 😆!

Problems under Windows OS, please refer to issue#8.

When you run the CatVTON workflow for the first time, the weight files will be automatically downloaded, usually taking dozens of minutes.

Gradio App

To deploy the Gradio App for CatVTON on your machine, run the following command, and checkpoints will be automatically downloaded from HuggingFace.

CUDA_VISIBLE_DEVICES=0 python app.py \
--output_dir="resource/demo/output" \
--mixed_precision="bf16" \
--allow_tf32 

When using bf16 precision, generating results with a resolution of 1024x768 only requires about 8G VRAM.

Inference

1. Data Preparation

Before inference, you need to download the VITON-HD or DressCode dataset. Once the datasets are downloaded, the folder structures should look like these:

├── VITON-HD
|   ├── test_pairs_unpaired.txt
│   ├── test
|   |   ├── image
│   │   │   ├── [000006_00.jpg | 000008_00.jpg | ...]
│   │   ├── cloth
│   │   │   ├── [000006_00.jpg | 000008_00.jpg | ...]
│   │   ├── agnostic-mask
│   │   │   ├── [000006_00_mask.png | 000008_00.png | ...]
...
├── DressCode
|   ├── test_pairs_paired.txt
|   ├── test_pairs_unpaired.txt
│   ├── [dresses | lower_body | upper_body]
|   |   ├── test_pairs_paired.txt
|   |   ├── test_pairs_unpaired.txt
│   │   ├── images
│   │   │   ├── [013563_0.jpg | 013563_1.jpg | 013564_0.jpg | 013564_1.jpg | ...]
│   │   ├── agnostic_masks
│   │   │   ├── [013563_0.png| 013564_0.png | ...]
...

For the DressCode dataset, we provide script to preprocessed agnostic masks, run the following command:

CUDA_VISIBLE_DEVICES=0 python preprocess_agnostic_mask.py \
--data_root_path <your_path_to_DressCode> 

2. Inference on VTIONHD/DressCode

To run the inference on the DressCode or VITON-HD dataset, run the following command, checkpoints will be automatically downloaded from HuggingFace.

CUDA_VISIBLE_DEVICES=0 python inference.py \
--dataset [dresscode | vitonhd] \
--data_root_path <path> \
--output_dir <path> 
--dataloader_num_workers 8 \
--batch_size 8 \
--seed 555 \
--mixed_precision [no | fp16 | bf16] \
--allow_tf32 \
--repaint \
--eval_pair  

3. Calculate Metrics

After obtaining the inference results, calculate the metrics using the following command:

CUDA_VISIBLE_DEVICES=0 python eval.py \
--gt_folder <your_path_to_gt_image_folder> \
--pred_folder <your_path_to_predicted_image_folder> \
--paired \
--batch_size=16 \
--num_workers=16 
  • --gt_folder and --pred_folder should be folders that contain only images.
  • To evaluate the results in a paired setting, use --paired; for an unpaired setting, simply omit it.
  • --batch_size and --num_workers should be adjusted based on your machine.

Acknowledgement

Our code is modified based on Diffusers. We adopt Stable Diffusion v1.5 inpainting as the base model. We use SCHP and DensePose to automatically generate masks in our Gradio App and ComfyUI workflow. Thanks to all the contributors!

License

All the materials, including code, checkpoints, and demo, are made available under the Creative Commons BY-NC-SA 4.0 license. You are free to copy, redistribute, remix, transform, and build upon the project for non-commercial purposes, as long as you give appropriate credit and distribute your contributions under the same license.

Citation

@misc{chong2024catvtonconcatenationneedvirtual,
 title={CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models}, 
 author={Zheng Chong and Xiao Dong and Haoxiang Li and Shiyue Zhang and Wenqing Zhang and Xujie Zhang and Hanqing Zhao and Xiaodan Liang},
 year={2024},
 eprint={2407.15886},
 archivePrefix={arXiv},
 primaryClass={cs.CV},
 url={https://arxiv.org/abs/2407.15886}, 
}

catvton's People

Contributors

eltociear avatar zheng-chong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

catvton's Issues

Dependencies issues

Hi, it would be great to try out such project. However the requirements.txt is a bit messed up. Lot's of broken or missing dependencies.
For example, densepose module is nowhere to be found as pip package, as well as detectron2 (this one I installed from git repo).
Can you please do a clean check on your requirements.txt and maybe update readme with an installation section?

Thanks.

VITON-HD inference needs not existing file

Hello and thank you for your work!

I've tried to run an inference on VITON-HD that I've downloaded locally. And while inference I see this error:

Traceback (most recent call last):
File "/home/cuda/CatVTON/inference.py", line 325, in
main()
File "/home/cuda/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/cuda/CatVTON/inference.py", line 269, in main
dataset = VITONHDTestDataset(args)
File "/home/cuda/CatVTON/inference.py", line 18, in init
self.data = self.load_data()
File "/home/cuda/CatVTON/inference.py", line 39, in load_data
assert os.path.exists(pair_txt:=os.path.join(self.args.data_root_path, 'test_pairs_unpaired.txt')), f"File {pair_txt} does not exist."
AssertionError: File /home/cuda/zalando-hd-resized/test_pairs_unpaired.txt does not exist.

I've downloaded it again and rechecked file structure, but there is no test_pairs_unpaired.txt file in VITON-HD dataset. How could I avoud this problem?

Details about training setting

Good work for the design of such simple vton pipeline.

I have tried to train CatVTON on vitonhd dataset, but the result is a little blurry as shown below. (38k iteration batchsize 8x32 512x384 resolution input, only attention parameters are trained)
image

I'm wondering is there any specific setting or trick in the loss part, for example how to compute the loss? (i.e. compute loss of latents of human images or the concat latents. )

I also noticed the training loss is relatively small at the beginning of the training, is this normal?

Epoch 0, step 0, step_loss: 0.06322, data_time: 2.104, time: 4.421
Epoch 0, step 1, step_loss: 0.04681, data_time: 0.058, time: 2.126
Epoch 0, step 2, step_loss: 0.06814, data_time: 0.058, time: 2.124
Epoch 0, step 3, step_loss: 0.03120, data_time: 0.064, time: 2.139
Epoch 0, step 4, step_loss: 0.02966, data_time: 0.059, time: 2.132
Epoch 0, step 5, step_loss: 0.03977, data_time: 0.059, time: 2.132
Epoch 0, step 6, step_loss: 0.05645, data_time: 0.059, time: 2.133

Error occurred when I tried to change input width and height!

Hi! Thanks for your great work!

When I don't give any specific width and height value, It works perfectly
But when I tried to change these width and height, the code runs successfully without any warnings or errors, but the result that I've got is just black empty image. (below image)
I think resolutions were changed into given options.

Thanks!

image

训练结果不同gs的异常

图像由左到右分别是服装、cloth guidance scale分别为1.0、1.5、2.0、2.5的生成图。
ddpm_result
ddim_result_c1

gs从小到大变化,服装细节逐渐可控,但重绘区域变暗变黑。整体图像光感正常的gs时,服装细节又控不了。不知大佬之前训练时有没有碰到这个情况,或者知道可能是什么原因导致的?

SCHP unable to run on CPU only environment

Thanks for the great work.
I am encountering a problem when running the script in a CPU-only environment (Colab with no GPU). Below are the error details:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
[<ipython-input-22-144066cfdfe5>](https://localhost:8080/#) in <cell line: 5>()
      3 
      4 from utils import resize_and_crop
----> 5 from model.cloth_masker import AutoMasker as AM

9 frames
[/content/CatVTON/model/cloth_masker.py](https://localhost:8080/#) in <module>
      7 import torch
      8 
----> 9 from model.SCHP import SCHP  # type: ignore
     10 from model.DensePose import DensePose  # type: ignore
     11 

[/content/CatVTON/model/SCHP/__init__.py](https://localhost:8080/#) in <module>
----> 1 from model.SCHP import networks
      2 from model.SCHP.utils.transforms import get_affine_transform, transform_logits
      3 
      4 from collections import OrderedDict
      5 import torch

[/content/CatVTON/model/SCHP/networks/__init__.py](https://localhost:8080/#) in <module>
      1 from __future__ import absolute_import
      2 
----> 3 from model.SCHP.networks.AugmentCE2P import resnet101
      4 
      5 __factory = {

[/content/CatVTON/model/SCHP/networks/AugmentCE2P.py](https://localhost:8080/#) in <module>
     19 # Note here we adopt the InplaceABNSync implementation from https://github.com/mapillary/inplace_abn
     20 # By default, the InplaceABNSync module contains a BatchNorm Layer and a LeakyReLu layer
---> 21 from model.SCHP.modules import InPlaceABNSync
     22 
     23 BatchNorm2d = functools.partial(InPlaceABNSync, activation='none')

[/content/CatVTON/model/SCHP/modules/__init__.py](https://localhost:8080/#) in <module>
----> 1 from .bn import ABN, InPlaceABN, InPlaceABNSync
      2 from .functions import ACT_RELU, ACT_LEAKY_RELU, ACT_ELU, ACT_NONE
      3 from .misc import GlobalAvgPool2d, SingleGPU
      4 from .residual import IdentityResidualBlock
      5 from .dense import DenseModule

[/content/CatVTON/model/SCHP/modules/bn.py](https://localhost:8080/#) in <module>
      8     from Queue import Queue
      9 
---> 10 from .functions import *
     11 
     12 

[/content/CatVTON/model/SCHP/modules/functions.py](https://localhost:8080/#) in <module>
      8 
      9 _src_path = path.join(path.dirname(path.abspath(__file__)), "src")
---> 10 _backend = load(name="inplace_abn",
     11                 extra_cflags=["-O3"],
     12                 sources=[path.join(_src_path, f) for f in [

[/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py](https://localhost:8080/#) in load(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module, is_standalone, keep_intermediates)
   1307         ...     verbose=True)
   1308     """
-> 1309     return _jit_compile(
   1310         name,
   1311         [sources] if isinstance(sources, str) else sources,

[/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py](https://localhost:8080/#) in _jit_compile(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module, is_standalone, keep_intermediates)
   1743         return _get_exec_path(name, build_directory)
   1744 
-> 1745     return _import_module_from_library(name, build_directory, is_python_module)
   1746 
   1747 

[/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py](https://localhost:8080/#) in _import_module_from_library(module_name, path, is_python_module)
   2141         spec = importlib.util.spec_from_file_location(module_name, filepath)
   2142         assert spec is not None
-> 2143         module = importlib.util.module_from_spec(spec)
   2144         assert isinstance(spec.loader, importlib.abc.Loader)
   2145         spec.loader.exec_module(module)

ImportError: /tmp/inplace_abn/inplace_abn.so: cannot open shared object file: No such file or directory

I believe the issue arises because some dependencies of SCHP require CUDA and are only available in a CUDA environment.
By the way, I have set export TORCH_EXTENSIONS_DIR=/tmp to overcome another issue, so you might see import errors from /tmp.
Do you have a solution to run SCHP in a CPU-only environment?

Love your work!

Thanks for sharing this work! Just let you know that I really love the simplicity and effectiveness of this model! Cheers!

xformers is not compatible with MacOS

Hey, I just wonder how to fix the compatibility issue with MacOS. Can't install the requirements file because the xformers is not compatible with MacOS.

VITON-HD results

Thank you for your great work on CatVTON!

I tested the VITON-HD model and generated 512x384 images.
I resized the ground truth to 512x384 and tested SSIM and FID, and found that the metrics are SSIM=0.856 and FID=8.63. This does not match the metrics in the paper.
So, were the metrics in the paper obtained using a "mix model", rather than just the model trained on VITON-HD?

Request for Training Code

Hello! This is great work. Hats off to you and your team. I would love to re-implement the results with training on my personal machine. I was wondering if there are plans to release the training code?

Comfyui无法加载节点

When loading the graph, the following node types were not found:
LoadAutoMasker
CatVTON
AutoMasker
LoadCatVTONPipeline

agnostic masks

Very good work, if I want to test my own models, how can I make agnostic masks?

Host the demo on Huggingface Spaces ZeroGPU

Hi @Zheng-Chong Congratulations on CatVTON release! Would be great to have the demo up on Huggingface Spaces. We provide GPU grants for interesting projects and paper-implementations, and would be happy to support CatVTON with ZeroGPU (A100s) sponsorship!

You might need to modify the current gradio code for ZeroGPU Spaces usage, actually. To understand this better, please refer to the usage section of the organization: https://huggingface.co/zero-gpu-explorers.

We also have a step-by-step guide for using the gradio sdk on Spaces: https://huggingface.co/docs/hub/en/spaces-sdks-gradio.

Applying for grants on Spaces is fairly easy using the Settings tab of your Space. For more information on how to apply for GPU grants on Spaces, please visit: https://huggingface.co/docs/hub/en/spaces-gpus#community-gpu-grants."

缺少SCHP.py

[Prompt Server] web root: D:\AI\ComfyUI\web_custom_versions\Comfy-Org_ComfyUI_frontend\1.2.27
Skip D:\AI\ComfyUI\custom_nodes\CatVTON module for custom nodes due to the lack of NODE_CLASS_MAPPINGS.
Adding D:\AI\ComfyUI\custom_nodes to sys.path
Could not find efficiency nodes

ModuleNotFoundError: No module named 'cv2'

(Catvton) C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable>C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\run_nvidia_gpu.bat

(Catvton) C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable>.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build
[START] Security scan
[DONE] Security scan

ComfyUI-Manager: installing dependencies done.

** ComfyUI startup time: 2024-08-02 18:17:57.433249
** Platform: Windows
** Python version: 3.11.9 (tags/v3.11.9:de54cf5, Apr 2 2024, 10:12:12) [MSC v.1938 64 bit (AMD64)]
** Python executable: C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\python_embeded\python.exe
** ComfyUI Path: C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI
** Log path: C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\comfyui.log

Prestartup times for custom nodes:
1.1 seconds: C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager

Total VRAM 8192 MB, total RAM 32632 MB
pytorch version: 2.3.1+cu121
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 2070 SUPER : cudaMallocAsync
Using pytorch cross attention
[Prompt Server] web root: C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\web
Traceback (most recent call last):
File "C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1941, in load_custom_node
module_spec.loader.exec_module(module)
File "", line 940, in exec_module
File "", line 241, in call_with_frames_removed
File "C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-CatVTON_init
.py", line 3, in
from .model.cloth_masker import AutoMasker as AM
File "C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-CatVTON\model\cloth_masker.py", line 5, in
import cv2
ModuleNotFoundError: No module named 'cv2'

Cannot import C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-CatVTON module for custom nodes: No module named 'cv2'

Loading: ComfyUI-Manager (V2.48.4)

ComfyUI Revision: 2445 [369f459b] | Released on '2024-08-01'

Import times for custom nodes:
0.0 seconds: C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\custom_nodes\websocket_image_save.py 0.0 seconds (IMPORT FAILED): C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-CatVTON
0.3 seconds: C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager

Starting server

there is an problem about cv2 import, but i can correctly import cv2 in my virtual environment,could u help me with this issue?

在Windows上通过Gradio使用的本地部署教程!

非常感谢[Zheng-Chong]大佬的工作,项目的效果非常出色。我在Windows上使用Gradio的本地部署过程中,遇到了不少问题,包括 #12 的问题,后来通过查阅资料和不断尝试,最终部署成功了。
我出了一期教程,希望能对其他朋友们有所帮助。再次感谢[Zheng-Chong]大佬的工作和开源精神,点赞!!

Windows本地部署教程:
https://www.bilibili.com/video/BV173YueAEdi/?vd_source=6c8b8679b818b05d24c65f49a65eb994

limitation of CatVTON & training code request

Dear authors,

It is great that you just made Diffusion-based VTON models much simpler and lightweight. It is quite intuitive to use only self-attention. I noticed that your model can mostly preserve the structure of the garment but for some examples, it can not really model simple textures and also it can change the color of the garment quite vastly. I think these limitations mostly come from the lack of training samples in the input space. Therefore it would be quite useful if you could share the training code to address this limitation of CatVTON.

Screenshot 2024-08-07 at 14 08 26
Screenshot 2024-08-07 at 14 08 38

Evaluation

Thanks for open-sourcing this work! I have a concern about the quantitative results reported in the paper. I used the vitonhd-16k-512 checkpoint to evaluate on VITON-HD, but the results did not match those reported in the paper, specifically I got LPIPS=0.1019, SSIM=0.8649, FID=13.5417(unpair), KID=6.748(unpair), which is relatively low compared to the paper report.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.