yangxy / pasd Goto Github PK

[ECCV2024] Pixel-Aware Stable Diffusion for Realistic Image Super-Resolution and Personalized Stylization

License: Apache License 2.0

Python 99.92% Shell 0.08%

pasd's Introduction

Pixel-Aware Stable Diffusion for Realistic Image Super-Resolution and Personalized Stylization (ECCV2024)

Tao Yang¹, Rongyuan Wu², Peiran Ren³, Xuansong Xie³, Lei Zhang²
¹ByteDance Inc.
²Department of Computing, The Hong Kong Polytechnic University
³DAMO Academy, Alibaba Group

News

(2024-7-12) I am training a new PASD based on SDXL and will release soon. Stay tuned!

(2024-7-1) Accepted by ECCV2024. A new version of our paper will be updated soon.

(2024-3-18) Please have a try on our colorization model via python test_pasd.y --pasd_model_path runs/pasd_color/checkpoint-180000 --control_type grayscale --high_level_info caption --use_pasd_light. You should use the noise scheduler provided in runs/pasd_color/scheduler which has been updated to ensure zero-terminal SNR in order to avoid the leaking residual signal from RGB image during training. Please read the updated paper for more details.

(2024-3-18) We have updated the paper. The weights and datasets are now available on Huggingface.

(2024-1-16) You may also want to check our new updates SeeSR and Phantom.

(2023-10-20) Add additional noise level via --added_noise_level and the SR result achieves a great balance between "extremely-detailed" and "over-smoothed". Very interesting!. You can control the SR's detail level freely.

(2023-10-18) Completely solved the issues by initializing latents with input LR images. Interestingly, the SR results also become much more stable.

(2023-10-11) Colab demo is now available. Credits to Masahide Okada.

(2023-10-09) Add training dataset.

(2023-09-28) Add tiled latent to allow upscaling ultra high-resolution images. Please carefully set latent_tiled_size as well as --decoder_tiled_size when upscaling large images.

(2023-09-12) Add Gradio demo.

(2023-09-11) Upload pre-trained models.

(2023-09-07) Upload source codes.

Our model can do various tasks. Hope you can enjoy it.

Realistic Image SR

Old photo restoration

Personalized Stylization

Colorization

Usage

Clone this repository:

git clone https://github.com/yangxy/PASD.git
cd PASD

Download SD1.5 models from huggingface and put them into checkpoints/stable-diffusion-v1-5.
Prepare training datasets. Please check dataloader/localdataset.py and dataloader/webdataset.py carefully and set the paths correctly. We highly recommend to use dataloader/webdataset.py.
Download our training dataset. DIV2K_train_HR | DIV8K-0 | DIV8K-1 | DIV8K-2 | DIV8K-3 | DIV8K-4 | DIV8K-5 | FFHQ_5K | Flickr2K_HR-0 | Flickr2K_HR-1 | Flickr2K_HR-2 | OST_animal | OST_building | OST_grass | OST_mountain | OST_plant | OST_sky | OST_water | Unsplash2K
Train a PASD.

bash ./train_pasd.sh

if you want to train pasd_light, use --use_pasd_light.

Test PASD.

Download our pre-trained models pasd | pasd_rrdb | pasd_light | pasd_light_rrdb, and put them into runs/.

python test_pasd.py # --use_pasd_light --use_personalized_model

Please read the arguments in test_pasd.py carefully. We adopt the tiled vae method proposed by multidiffusion-upscaler-for-automatic1111 to save GPU memory.

Please try --use_personalized_model for personalized stylizetion, old photo restoration and real-world SR. Set --conditioning_scale for different stylized strength.

We use personalized models including majicMIX realistic(for SR and restoration), ToonYou(for stylization) and modern disney style(unet only, for stylization). You can download more from communities and put them into checkpoints/personalized_models.

If the default setting does not yield good results, try different --pasd_model_path, --seed, --prompt, --upscale, or --high_level_info to get better performance.

Gradio Demo

python gradio_pasd.py

Main idea

Citation

If our work is useful for your research, please consider citing:

@inproceedings{yang2023pasd,
    title={Pixel-Aware Stable Diffusion for Realistic Image Super-Resolution and Personalized Stylization},
    author={Tao Yang, Rongyuan Wu, Peiran Ren, Xuansong Xie, and Lei Zhang},
    booktitle={The European Conference on Computer Vision (ECCV) 2024},
    year={2023}
}

Acknowledgments

Our project is based on diffusers.

Contact

If you have any questions or suggestions about this paper, feel free to reach me at [email protected].

pasd's People

Contributors

Stargazers

Watchers

Forkers

runngezhang ip-superresolution maxmax2016 trungtruc123 camenduru ablenine yurlen edvardhua mstrokin ikechukwuabuah bit-r fzft hantupota jmaigc ya-yura ayeenigma botiralijalolov nicholasldc peterzs dongmiao0725 drapolenko ramstorageai saulocatharino ityulkanov roger2482 st3alth torinosm fffiloni mrm8488 zfbok absalan mbr0 choohrova janfschr onlinermm laok724 gordi256 guoqingru0911 wendashi house-yuyu surebert cmlmonkey wuzile98 sszzpp viruspc suryatmodulus hhhhhgggh jorik041 adambear assassindesign wangzhichengyition imraunav helihui shimomurakei atry wangqi-xxxx

pasd's Issues

When reproducing, the CPU memory explores

Hi, thanks for your meaningful work. However, when I reproduce PASD as your instruction in README, I notice the CPU memory keeps incereaing. Then my machine collapsed after 60k iterations, where the CPU memory usage goes to ~1T. Any ideas about this situation? Thanks for your any suggestion.
BTW, I employ the recomanded WebDataset + torch.Dataloader.

License

Hi,
Thanks so much for making this amazing model! Would you consider making it open source by adding a OSI-approved license (ie MIT/Apache 2.0/ISC)?
Thank you!

can you release the dataset used for training?

Hi! To reproduce your intersting work, i hope you can release your dataset which matches the formation defined in webdataset.py
thanks a lot!

when I training SR pasd in my dataset , some steps loss will be NaN

How to solve it ?

Data load error, configuration file not found

Dear authors,
I have read about your PASD work,however, when I used the DIV2K dataset for training, I could not find the configuration file params_realesrgan.yml, and after checking, I found that there was no problem with the file name and path.

How to train PASD on DIV2K_train_HR?

I want to train PASD on DIV2K_train_HR dataset. But I don't know how to change the code in webdatasets.py. Do I only need to change wds_urls into my dataset path? I already make my dataset a 'tar' file.

Is there any installation guidance?

Hello! Is there any package installation guidance? I try to use pip install -r requirements.txt. However, the diffusers package with the latest one cannot find some function sometimes. Is there any specific version for diffusers used in the experiments of the paper? Thanks!

How to generalize the personalize style photo

PASD is very nice job!
I was trying to make a personalize style(toonyou) photo, the original photo is 000020x2.png in the folder samples. However, I have tried toonyou.safetensors in civatai, with pasd_rrdb/checkpoint-100000, --use_personalized_model, others as default. I can not generalize the beautiful photo just like you paper show. Can you tell me what are the parameters in personalized_model?
Thank you!

How to try personalized stylization in ModelScope pipeline?

I'll try it, but fail.

import cv2
import torch
from modelscope.outputs import OutputKeys
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks

input_location = 'http://public-vigen-video.oss-cn-shanghai.aliyuncs.com/robin/results/output_test_pasd/0fbc3855c7cfdc95.png'
prompt = ''
output_image_path = 'result.png'

input = {
    'image': input_location,
    'prompt': prompt,
    'upscale': 2,
    'fidelity_scale_fg': 1.0,
    'fidelity_scale_bg': 1.0,
    'use_personalized_model': True,
    'personalized_model_path': 'toonyou_beta3.safetensors'
}
pasd = pipeline(Tasks.image_super_resolution_pasd, model='damo/PASD_image_super_resolutions')
output = pasd(input)[OutputKeys.OUTPUT_IMG]
cv2.imwrite(output_image_path, output)
print('pipeline: the output image path is {}'.format(output_image_path))

where pasd_color

I would like to use LCM instead of sd1.5 pre-trained model to train PASD,OK?

Training problem

I want to train PASD on DIV2K_train_HR dataset. But when I change the wds_urls into wds_urls = "datasets/DIV2K_train_HR.tar".
This error occured.
I just changed the code in webdatasets.py : line 256.

Image Colorization

Thank you for sharing your work. I want to know how to train a colorization model on my own dataset.

How does one load LORAs into the inference script?

How these indicators such as PSNR, SSIM and FID are calculated?

Hi, I would like to ask if there is a calculation code for PSNR, SSIM, FID, LPIPS, DISTS, MUSIQ and other related evaluation indicators, can you share it，thank you.

The RAM continues to increase when running the training code.

During training, the RAM keeps increasing. Is it a memory leak? But I can’t find where the problem is. Can anyone help me?

Difference between article and code in how the PACA is derived

At first glance the data for PACA is coming after zero conv module as per the code, but in the article's "Figure 2: Architecture of the proposed pixel-aware stable diffusion (PASD) network", the bottom right yellow picture shows data for PACA coming from right before the zero conv with a split. Which one of this is effective or right?
In the same diagram, I don't see the unet noisy input being added to the controlnet data, but the code does. Which one is right.

noisy output at checkpoint-340000

The image generated by the model I trained at checkpoint-340000 is noisy, what's wrong with it

added_prompt and negative_prompt

Do we only need to add added_prompt and negative_prompt during inference, not during training?

Failure case?

Output of this image

is

I wonder if my settings are incorrect or if the image degradation is too severe for super-resolution?

local variable 'validation_image' referenced before assignment

suggestion: colab demo for inference

Hi. I made a colab notebook for inference.
I think PASD is underrated compared to other diffusion prior based models like DiffBIR.

Testing Datasets

Can you please post the details of the test dataset? I see there are no instructions specifically for the DIV2K valid dataset. I used the weights you posted to test on the DIV2K valid dataset posted by stablesr and found that the results were not the same.

too many values to unpack (expected 2)

Updated from latest changes and now get:

$ python test_pasd.py
C:\PASD\pipelines\pipeline_pasd.py:41: FutureWarning: Importing `DiffusionPipeline` or `ImagePipelineOutput` from diffusers.pipeline_utils is deprecated. Please import from diffusers.pipelines.pipeline_utils instead.
  from diffusers.pipeline_utils import DiffusionPipeline
C:\Python310\lib\site-packages\torchvision\transforms\functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional.
  warnings.warn(
clean, high-resolution, 8k
  0%|          | 0/20 [00:00<?, ?it/s]
too many values to unpack (expected 2)

cannot fit 'int' into an index-sized integer，测试时没有生成图像

测试时，output里没有生成图像

python test_pasd.py --use_personalized_model
/home/root1/anaconda3/envs/pasd/lib/python3.10/site-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
/home/root1/anaconda3/envs/pasd/lib/python3.10/site-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
/home/root1/anaconda3/envs/pasd/lib/python3.10/site-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
/home/root1/anaconda3/envs/pasd/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
config.json: 4.52kB [00:00, 15.6MB/s]
INFO:root:Loaded coca_ViT-L-14 model config.
INFO:root:Loading pretrained coca_ViT-L-14 weights (mscoco_finetuned_laion2B-s13B-b90k).
a dog sitting in the grass with its tongue hanging out . clean, high-resolution, 8k
cannot fit 'int' into an index-sized integer

train on custom dataset

Hi, thanks for this excellent work! I would like to try on my own dataset (like hr_512, lr_128), could you please tell me which dataloader I should use to change to my own dataset path?

Thanks

The replication results of the experiment cannot be aligned

Thanks for your impressive work, PASD. Now I encounter some issues while reproducing your Real-IR experiment results and would appreciate your assistance.

After training the model for 500k steps(as instructed in this repo), I tested the performance of different step model weights on the benchmark dataset, like 50k, 100k, 200k. However, the inference results of these weights I replicated are ALL NOT as impressive as yours. These are looked more blurry&noisy and lack sharpness. (FYI, I have included several comparative sample images from DRealSRx4 dataset, 128px -> 512px, in the attachment)

After code checking, I think the code is clean. Perhaps there's a discrepancy in the configuration of the degraded model parameters, i.e real-esrgan configure parameters? Or there are any magic training trick?

I would greatly appreciate any ideas or assistance you can provide! Thank you again!

Is colorization model released or not?

Dear authors,
I have read about your PASD work, it mentioned PASD also support colorization task, may I ask whether the currently released PASD/PASD-light/PASD-RRDB including the colorization or not?
if not, may I ask will the colorization model be released? Many thanks.

will you support PASD for LCM ?

Could you be more specific about the python and pytorch versions？

thanks for help！

Colab doesn't add details

The colab notebook does upscale but doesn't add details to the image. I have checked same settings on the colab and the demospace, thedemospace does excelent job on adding details.
In the colab i get warning in the last cell, then it keeps working and gives me an upscale:

2024-01-26 10:42:39.581017: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-01-26 10:42:39.581065: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-01-26 10:42:39.582556: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-01-26 10:42:39.590674: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-01-26 10:42:41.313011: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/usr/local/lib/python3.10/dist-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be removed in 0.17. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional.
warnings.warn(

why inference slow in A100 ?

when I inference 1024*1024 image , [Tiled VAE]: the input size is tiny and unnecessary to tile. [Tiled VAE]: Done in 12.476s, max VRAM alloc 5802.607 MB
But I want to use more gpu and fast than now , how to solve it ?
And inference once some model offload in gpu

ImportError: cannot import name 'PositionNet'

C:\PASD-main>python gradio_pasd.py
C:\PASD-main\pipelines\pipeline_pasd.py:42: FutureWarning: Importing `DiffusionPipeline` or `ImagePipelineOutput` from diffusers.pipeline_utils is deprecated. Please import from diffusers.pipelines.pipeline_utils instead.
  from diffusers.pipeline_utils import DiffusionPipeline
C:\Python310\lib\site-packages\torchvision\models\_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
C:\Python310\lib\site-packages\torchvision\models\_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=None`.
  warnings.warn(msg)
Traceback (most recent call last):
  File "C:\PASD-main\gradio_pasd.py", line 29, in <module>
    from models.pasd.unet_2d_condition import UNet2DConditionModel
  File "C:\PASD-main\models\pasd\unet_2d_condition.py", line 27, in <module>
    from diffusers.models.embeddings import (
ImportError: cannot import name 'PositionNet' from 'diffusers.models.embeddings' (C:\Python310\lib\site-packages\diffusers\models\embeddings.py)

controlnet model producing fig. 1

Hi, could you share the ControlNet model producing the results in Fig. 1?

Test Problem

I want to test pasd_light using python test_pasd.py --use_pasd_light .But I encountered this error:

Do you have any suggestion? Looking forward to ur reply~

cusolver error: CUSOLVER_STATUS_EXECUTION_FAILED, when calling `cusolverDnSgetrf( handle, m, n, dA, ldda, static_cast<float*>(dataPtr.get()), ipiv, info)`. This error may appear if the input matrix contains NaN.

I test according to readme.md and report an error.

"cusolver error: CUSOLVER_STATUS_EXECUTION_FAILED, when calling cusolverDnSgetrf( handle, m, n, dA, ldda, static_cast<float*>(dataPtr.get()), ipiv, info). This error may appear if the input matrix contains NaN."

There is a problem with "latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs, return_dict=False)[0]" in line 1151, pipelines/pipeline_pasd.py.

I hope you can give me some guidance, thank you very much!

very nice article, looking forward to try your models

The text information obtained by image classification and target detection are all frogs. The text information obtained by the BLIP network also contains frogs. What is the role of repeated text?

The model for colorization

Hi, authors,

Congrats for the nice work.

I wonder what is the model/config for colorization?

Thx a lot

About Datasets

May I ask which datasets you have used? In the paper, you mentioned using div2k and flickr2k, but I see that there are also datasets such as FFHQ on the GitHub you published.

Test problem

when I download pre-trained models pasd_light and put them into runs/. And I download SD1.5 models v1-5-pruned-emaonly.ckpt and put them into checkpoints/stable-diffusion-v1-5. I run python test_pasd.py --use_pasd_light there is an error :
OSError: Error no file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory checkpoints/stable-diffusion-v1-5.
Am I missing something that needs to be downloaded?

Training error

Dear authors,
I have read about your PASD work,and I used the DIV2K dataset for training,due to the limitation of GPU (48GB of graphics memory), I reduced the epochs, batch size, and steps, but the training progress bar did not move, and in the end, the model weights were saved.

你好，怎么生成训练的标签？可以指导一下吗

can you help me？

Training Dataset URL failed

Very nice job! When I want to train the model, I found that the training dataset URL is broken. May I ask for you updating the training dataset URL. Thank you very much.

ImportError: cannot import name 'is_compiled_module' from 'diffusers.utils'

python test_pasd.py
Traceback (most recent call last):
  File "C:\PASD-main\test_pasd.py", line 22, in <module>
    from pipelines.pipeline_pasd import StableDiffusionControlNetPipeline
  File "C:\PASD-main\pipelines\pipeline_pasd.py", line 32, in <module>
    from diffusers.utils import (
ImportError: cannot import name 'is_compiled_module' from 'diffusers.utils' (C:\Python310\lib\site-packages\diffusers\utils\__init__.py)

$ python test_pasd.py
Traceback (most recent call last):
  File "C:\PASD-main\test_pasd.py", line 22, in <module>
    from pipelines.pipeline_pasd import StableDiffusionControlNetPipeline
  File "C:\PASD-main\pipelines\pipeline_pasd.py", line 32, in <module>
    from diffusers.utils import (
ImportError: cannot import name 'randn_tensor' from 'diffusers.utils' (C:\Python310\lib\site-packages\diffusers\utils\__init__.py)

 python test_pasd.py
C:\PASD-main\pipelines\pipeline_pasd.py:45: FutureWarning: Importing `DiffusionPipeline` or `ImagePipelineOutput` from diffusers.pipeline_utils is deprecated. Please import from diffusers.pipelines.pipeline_utils instead.
  from diffusers.pipeline_utils import DiffusionPipeline
C:\Python310\lib\site-packages\torchvision\transforms\functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional.
  warnings.warn(
Traceback (most recent call last):
  File "C:\PASD-main\test_pasd.py", line 267, in <module>
    main(args)
  File "C:\PASD-main\test_pasd.py", line 167, in main
    pipeline = load_pasd_pipeline(args, accelerator, enable_xformers_memory_efficient_attention)
  File "C:\PASD-main\test_pasd.py", line 40, in load_pasd_pipeline
    from models.pasd.controlnet import ControlNetModel
  File "C:\PASD-main\models\pasd\controlnet.py", line 27, in <module>
    from basicsr.archs.rrdbnet_arch import RRDB
ModuleNotFoundError: No module named 'basicsr.archs.rrdbnet_arch'

Noise on output

Usually on darker areas, there is some noise like this:

Is there a way to reduce that noise? Or add something to the code to denoise it? Thank you🙏

very nice work! Do PACA only exist in the up-sample blocks of U-Net?

Hi~, can I ask for some detail about the model.
In Figure 2 of the paper, it is shown that PACA only exists in the up-sample blocks of U-Net.
Is there any specific motivation behind this design?
Looking forward to your reply.
Thanks very much
@yangxy

I want to ask which part of the code is responsible for PACA module? thank you.

very nice work! I want to ask which part of the model is responsible for degradation removal?

thanks for your work and it is really interesting! However, while reading your code I can't make it clear that which part of the model is responsible for the degradation removal work.
in line 927 of train_pasd.py , you calculated F.l1_loss(pixel_values.float(), controlnet_cond_mid.float(), reduction="mean")
so you mean controlnet_cond_mid is the denoised image for diffusion model? I'm not sure if I understood your idea

what do the different model names stand for?

Hi, authors.

Thx for your nice open-source work, I wonder what are the rrdb/light?