Code Monkey home page Code Monkey logo

pasd's Introduction

Pixel-Aware Stable Diffusion for Realistic Image Super-Resolution and Personalized Stylization (ECCV2024)

Paper

Tao Yang1, Rongyuan Wu2, Peiran Ren3, Xuansong Xie3, Lei Zhang2
1ByteDance Inc.
2Department of Computing, The Hong Kong Polytechnic University
3DAMO Academy, Alibaba Group

News

(2024-7-12) I am training a new PASD based on SDXL and will release soon. Stay tuned!

(2024-7-1) Accepted by ECCV2024. A new version of our paper will be updated soon.

(2024-3-18) Please have a try on our colorization model via python test_pasd.y --pasd_model_path runs/pasd_color/checkpoint-180000 --control_type grayscale --high_level_info caption --use_pasd_light. You should use the noise scheduler provided in runs/pasd_color/scheduler which has been updated to ensure zero-terminal SNR in order to avoid the leaking residual signal from RGB image during training. Please read the updated paper for more details.

(2024-3-18) We have updated the paper. The weights and datasets are now available on Huggingface.

(2024-1-16) You may also want to check our new updates SeeSR and Phantom.

(2023-10-20) Add additional noise level via --added_noise_level and the SR result achieves a great balance between "extremely-detailed" and "over-smoothed". Very interesting!. You can control the SR's detail level freely.

(2023-10-18) Completely solved the issues by initializing latents with input LR images. Interestingly, the SR results also become much more stable.

(2023-10-11) Colab demo is now available. Credits to Masahide Okada.

(2023-10-09) Add training dataset.

(2023-09-28) Add tiled latent to allow upscaling ultra high-resolution images. Please carefully set latent_tiled_size as well as --decoder_tiled_size when upscaling large images.

(2023-09-12) Add Gradio demo.

(2023-09-11) Upload pre-trained models.

(2023-09-07) Upload source codes.

Our model can do various tasks. Hope you can enjoy it.

Realistic Image SR

Old photo restoration

Personalized Stylization

Colorization

Usage

  • Clone this repository:
git clone https://github.com/yangxy/PASD.git
cd PASD
bash ./train_pasd.sh

if you want to train pasd_light, use --use_pasd_light.

  • Test PASD.

Download our pre-trained models pasd | pasd_rrdb | pasd_light | pasd_light_rrdb, and put them into runs/.

python test_pasd.py # --use_pasd_light --use_personalized_model

Please read the arguments in test_pasd.py carefully. We adopt the tiled vae method proposed by multidiffusion-upscaler-for-automatic1111 to save GPU memory.

Please try --use_personalized_model for personalized stylizetion, old photo restoration and real-world SR. Set --conditioning_scale for different stylized strength.

We use personalized models including majicMIX realistic(for SR and restoration), ToonYou(for stylization) and modern disney style(unet only, for stylization). You can download more from communities and put them into checkpoints/personalized_models.

If the default setting does not yield good results, try different --pasd_model_path, --seed, --prompt, --upscale, or --high_level_info to get better performance.

  • Gradio Demo
python gradio_pasd.py

Main idea

Citation

If our work is useful for your research, please consider citing:

@inproceedings{yang2023pasd,
    title={Pixel-Aware Stable Diffusion for Realistic Image Super-Resolution and Personalized Stylization},
    author={Tao Yang, Rongyuan Wu, Peiran Ren, Xuansong Xie, and Lei Zhang},
    booktitle={The European Conference on Computer Vision (ECCV) 2024},
    year={2023}
}

Acknowledgments

Our project is based on diffusers.

Contact

If you have any questions or suggestions about this paper, feel free to reach me at [email protected].

pasd's People

Contributors

atry avatar yangxy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pasd's Issues

When reproducing, the CPU memory explores

Hi, thanks for your meaningful work. However, when I reproduce PASD as your instruction in README, I notice the CPU memory keeps incereaing. Then my machine collapsed after 60k iterations, where the CPU memory usage goes to ~1T. Any ideas about this situation? Thanks for your any suggestion.
BTW, I employ the recomanded WebDataset + torch.Dataloader.

License

Hi,
Thanks so much for making this amazing model! Would you consider making it open source by adding a OSI-approved license (ie MIT/Apache 2.0/ISC)?
Thank you!

Data load error, configuration file not found

Dear authors,
I have read about your PASD work,however, when I used the DIV2K dataset for training, I could not find the configuration file params_realesrgan.yml, and after checking, I found that there was no problem with the file name and path.
1

How to train PASD on DIV2K_train_HR?

I want to train PASD on DIV2K_train_HR dataset. But I don't know how to change the code in webdatasets.py. Do I only need to change wds_urls into my dataset path? I already make my dataset a 'tar' file.

Is there any installation guidance?

Hello! Is there any package installation guidance? I try to use pip install -r requirements.txt. However, the diffusers package with the latest one cannot find some function sometimes. Is there any specific version for diffusers used in the experiments of the paper? Thanks!

How to generalize the personalize style photo

PASD is very nice job!
I was trying to make a personalize style(toonyou) photo, the original photo is 000020x2.png in the folder samples. However, I have tried toonyou.safetensors in civatai, with pasd_rrdb/checkpoint-100000, --use_personalized_model, others as default. I can not generalize the beautiful photo just like you paper show. Can you tell me what are the parameters in personalized_model?
Thank you!

How to try personalized stylization in ModelScope pipeline?

I'll try it, but fail.

import cv2
import torch
from modelscope.outputs import OutputKeys
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks

input_location = 'http://public-vigen-video.oss-cn-shanghai.aliyuncs.com/robin/results/output_test_pasd/0fbc3855c7cfdc95.png'
prompt = ''
output_image_path = 'result.png'

input = {
    'image': input_location,
    'prompt': prompt,
    'upscale': 2,
    'fidelity_scale_fg': 1.0,
    'fidelity_scale_bg': 1.0,
    'use_personalized_model': True,
    'personalized_model_path': 'toonyou_beta3.safetensors'
}
pasd = pipeline(Tasks.image_super_resolution_pasd, model='damo/PASD_image_super_resolutions')
output = pasd(input)[OutputKeys.OUTPUT_IMG]
cv2.imwrite(output_image_path, output)
print('pipeline: the output image path is {}'.format(output_image_path))

Training problem

I want to train PASD on DIV2K_train_HR dataset. But when I change the wds_urls into wds_urls = "datasets/DIV2K_train_HR.tar".
This error occured.
I just changed the code in webdatasets.py : line 256.
image

Image Colorization

Thank you for sharing your work. I want to know how to train a colorization model on my own dataset.

Difference between article and code in how the PACA is derived

  1. At first glance the data for PACA is coming after zero conv module as per the code, but in the article's "Figure 2: Architecture of the proposed pixel-aware stable diffusion (PASD) network", the bottom right yellow picture shows data for PACA coming from right before the zero conv with a split. Which one of this is effective or right?
  2. In the same diagram, I don't see the unet noisy input being added to the controlnet data, but the code does. Which one is right.

Failure case?

Output of this image
div2k0801
is
output0801
I wonder if my settings are incorrect or if the image degradation is too severe for super-resolution?

Testing Datasets

Can you please post the details of the test dataset? I see there are no instructions specifically for the DIV2K valid dataset. I used the weights you posted to test on the DIV2K valid dataset posted by stablesr and found that the results were not the same.

too many values to unpack (expected 2)

Updated from latest changes and now get:

$ python test_pasd.py
C:\PASD\pipelines\pipeline_pasd.py:41: FutureWarning: Importing `DiffusionPipeline` or `ImagePipelineOutput` from diffusers.pipeline_utils is deprecated. Please import from diffusers.pipelines.pipeline_utils instead.
  from diffusers.pipeline_utils import DiffusionPipeline
C:\Python310\lib\site-packages\torchvision\transforms\functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional.
  warnings.warn(
clean, high-resolution, 8k
  0%|          | 0/20 [00:00<?, ?it/s]
too many values to unpack (expected 2)

cannot fit 'int' into an index-sized integer,测试时没有生成图像

测试时,output里没有生成图像

image

python test_pasd.py --use_personalized_model
/home/root1/anaconda3/envs/pasd/lib/python3.10/site-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
/home/root1/anaconda3/envs/pasd/lib/python3.10/site-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
/home/root1/anaconda3/envs/pasd/lib/python3.10/site-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
/home/root1/anaconda3/envs/pasd/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
config.json: 4.52kB [00:00, 15.6MB/s]
INFO:root:Loaded coca_ViT-L-14 model config.
INFO:root:Loading pretrained coca_ViT-L-14 weights (mscoco_finetuned_laion2B-s13B-b90k).
a dog sitting in the grass with its tongue hanging out . clean, high-resolution, 8k
cannot fit 'int' into an index-sized integer

train on custom dataset

Hi, thanks for this excellent work! I would like to try on my own dataset (like hr_512, lr_128), could you please tell me which dataloader I should use to change to my own dataset path?

Thanks

The replication results of the experiment cannot be aligned

Thanks for your impressive work, PASD. Now I encounter some issues while reproducing your Real-IR experiment results and would appreciate your assistance.

After training the model for 500k steps(as instructed in this repo), I tested the performance of different step model weights on the benchmark dataset, like 50k, 100k, 200k. However, the inference results of these weights I replicated are ALL NOT as impressive as yours. These are looked more blurry&noisy and lack sharpness. (FYI, I have included several comparative sample images from DRealSRx4 dataset, 128px -> 512px, in the attachment)

After code checking, I think the code is clean. Perhaps there's a discrepancy in the configuration of the degraded model parameters, i.e real-esrgan configure parameters? Or there are any magic training trick?

I would greatly appreciate any ideas or assistance you can provide! Thank you again!
panasonic_145
sony_82

Is colorization model released or not?

Dear authors,
I have read about your PASD work, it mentioned PASD also support colorization task, may I ask whether the currently released PASD/PASD-light/PASD-RRDB including the colorization or not?
if not, may I ask will the colorization model be released? Many thanks.

Colab doesn't add details

The colab notebook does upscale but doesn't add details to the image. I have checked same settings on the colab and the demospace, thedemospace does excelent job on adding details.
In the colab i get warning in the last cell, then it keeps working and gives me an upscale:

2024-01-26 10:42:39.581017: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-01-26 10:42:39.581065: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-01-26 10:42:39.582556: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-01-26 10:42:39.590674: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-01-26 10:42:41.313011: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/usr/local/lib/python3.10/dist-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be removed in 0.17. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional.
warnings.warn(

why inference slow in A100 ?

when I inference 1024*1024 image , [Tiled VAE]: the input size is tiny and unnecessary to tile. [Tiled VAE]: Done in 12.476s, max VRAM alloc 5802.607 MB
But I want to use more gpu and fast than now , how to solve it ?
And inference once some model offload in gpu

ImportError: cannot import name 'PositionNet'

C:\PASD-main>python gradio_pasd.py
C:\PASD-main\pipelines\pipeline_pasd.py:42: FutureWarning: Importing `DiffusionPipeline` or `ImagePipelineOutput` from diffusers.pipeline_utils is deprecated. Please import from diffusers.pipelines.pipeline_utils instead.
  from diffusers.pipeline_utils import DiffusionPipeline
C:\Python310\lib\site-packages\torchvision\models\_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
C:\Python310\lib\site-packages\torchvision\models\_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=None`.
  warnings.warn(msg)
Traceback (most recent call last):
  File "C:\PASD-main\gradio_pasd.py", line 29, in <module>
    from models.pasd.unet_2d_condition import UNet2DConditionModel
  File "C:\PASD-main\models\pasd\unet_2d_condition.py", line 27, in <module>
    from diffusers.models.embeddings import (
ImportError: cannot import name 'PositionNet' from 'diffusers.models.embeddings' (C:\Python310\lib\site-packages\diffusers\models\embeddings.py)

Test Problem

I want to test pasd_light using python test_pasd.py --use_pasd_light .But I encountered this error:
image
Do you have any suggestion? Looking forward to ur reply~

cusolver error: CUSOLVER_STATUS_EXECUTION_FAILED, when calling `cusolverDnSgetrf( handle, m, n, dA, ldda, static_cast<float*>(dataPtr.get()), ipiv, info)`. This error may appear if the input matrix contains NaN.

I test according to readme.md and report an error.

"cusolver error: CUSOLVER_STATUS_EXECUTION_FAILED, when calling cusolverDnSgetrf( handle, m, n, dA, ldda, static_cast<float*>(dataPtr.get()), ipiv, info). This error may appear if the input matrix contains NaN."

There is a problem with "latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs, return_dict=False)[0]" in line 1151, pipelines/pipeline_pasd.py.

I hope you can give me some guidance, thank you very much!

The model for colorization

Hi, authors,

Congrats for the nice work.

I wonder what is the model/config for colorization?

Thx a lot

About Datasets

May I ask which datasets you have used? In the paper, you mentioned using div2k and flickr2k, but I see that there are also datasets such as FFHQ on the GitHub you published.

Test problem

when I download pre-trained models pasd_light and put them into runs/. And I download SD1.5 models v1-5-pruned-emaonly.ckpt and put them into checkpoints/stable-diffusion-v1-5. I run python test_pasd.py --use_pasd_light there is an error :
OSError: Error no file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory checkpoints/stable-diffusion-v1-5.
Am I missing something that needs to be downloaded?

Training error

Dear authors,
I have read about your PASD work,and I used the DIV2K dataset for training,due to the limitation of GPU (48GB of graphics memory), I reduced the epochs, batch size, and steps, but the training progress bar did not move, and in the end, the model weights were saved.
1

Training Dataset URL failed

Very nice job! When I want to train the model, I found that the training dataset URL is broken. May I ask for you updating the training dataset URL. Thank you very much.

ImportError: cannot import name 'is_compiled_module' from 'diffusers.utils'

python test_pasd.py
Traceback (most recent call last):
  File "C:\PASD-main\test_pasd.py", line 22, in <module>
    from pipelines.pipeline_pasd import StableDiffusionControlNetPipeline
  File "C:\PASD-main\pipelines\pipeline_pasd.py", line 32, in <module>
    from diffusers.utils import (
ImportError: cannot import name 'is_compiled_module' from 'diffusers.utils' (C:\Python310\lib\site-packages\diffusers\utils\__init__.py)
$ python test_pasd.py
Traceback (most recent call last):
  File "C:\PASD-main\test_pasd.py", line 22, in <module>
    from pipelines.pipeline_pasd import StableDiffusionControlNetPipeline
  File "C:\PASD-main\pipelines\pipeline_pasd.py", line 32, in <module>
    from diffusers.utils import (
ImportError: cannot import name 'randn_tensor' from 'diffusers.utils' (C:\Python310\lib\site-packages\diffusers\utils\__init__.py)
 python test_pasd.py
C:\PASD-main\pipelines\pipeline_pasd.py:45: FutureWarning: Importing `DiffusionPipeline` or `ImagePipelineOutput` from diffusers.pipeline_utils is deprecated. Please import from diffusers.pipelines.pipeline_utils instead.
  from diffusers.pipeline_utils import DiffusionPipeline
C:\Python310\lib\site-packages\torchvision\transforms\functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional.
  warnings.warn(
Traceback (most recent call last):
  File "C:\PASD-main\test_pasd.py", line 267, in <module>
    main(args)
  File "C:\PASD-main\test_pasd.py", line 167, in main
    pipeline = load_pasd_pipeline(args, accelerator, enable_xformers_memory_efficient_attention)
  File "C:\PASD-main\test_pasd.py", line 40, in load_pasd_pipeline
    from models.pasd.controlnet import ControlNetModel
  File "C:\PASD-main\models\pasd\controlnet.py", line 27, in <module>
    from basicsr.archs.rrdbnet_arch import RRDB
ModuleNotFoundError: No module named 'basicsr.archs.rrdbnet_arch'

Noise on output

Usually on darker areas, there is some noise like this:
pasd-noise

Is there a way to reduce that noise? Or add something to the code to denoise it? Thank you🙏

very nice work! I want to ask which part of the model is responsible for degradation removal?

thanks for your work and it is really interesting! However, while reading your code I can't make it clear that which part of the model is responsible for the degradation removal work.
in line 927 of train_pasd.py , you calculated F.l1_loss(pixel_values.float(), controlnet_cond_mid.float(), reduction="mean")
so you mean controlnet_cond_mid is the denoised image for diffusion model? I'm not sure if I understood your idea

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.