zsyoaoa / resshift Goto Github PK

View Code? Open in Web Editor NEW

579.0 15.0 33.0 53.29 MB

ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting (NeurIPS 2023 Spotlight)

License: Other

Python 100.00%

resshift's Introduction

ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting (NeurIPS 2023, Spotlight)

Zongsheng Yue, Jianyi Wang, Chen Change Loy

Conference Paper | Journal Paper | Project Page | Video

⭐ If ResShift is helpful to your images or projects, please help star this repo. Thanks! 🤗

Diffusion-based image super-resolution (SR) methods are mainly limited by the low inference speed due to the requirements of hundreds or even thousands of sampling steps. Existing acceleration sampling techniques inevitably sacrifice performance to some extent, leading to over-blurry SR results. To address this issue, we propose a novel and efficient diffusion model for SR that significantly reduces the number of diffusion steps, thereby eliminating the need for post-acceleration during inference and its associated performance deterioration. Our method constructs a Markov chain that transfers between the high-resolution image and the low-resolution image by shifting the residual between them, substantially improving the transition efficiency. Additionally, an elaborate noise schedule is developed to flexibly control the shifting speed and the noise strength during the diffusion process. Extensive experiments demonstrate that the proposed method obtains superior or at least comparable performance to current state-of-the-art methods on both synthetic and real-world datasets, even only with 15 sampling steps.

Update

2024.03.11: Update the code for the Journal paper
2023.12.02: Add configurations for the x2 super-resolution task.
2023.08.15: Add .
2023.08.15: Add Gradio Demo.
2023.08.14: Add bicubic (matlab resize) model.
2023.08.14: Add Project Page.
2023.08.02: Add Replicate demo .
2023.07.31: Add Colab demo .
2023.07.24: Create this repo.

Requirements

Python 3.10, Pytorch 2.1.2, xformers 0.0.23
More detail (See environment.yml) A suitable conda environment named resshift can be created and activated with:

conda create -n resshift python=3.10
conda activate resshift
pip install -r requirements.txt

conda env create -f environment.yml
conda activate resshift

Applications

👉 Real-world image super-resolution

👉 Image inpainting

👉 Blind Face Restoration

Online Demo

You can try our method through an online demo:

python app.py

Fast Testing

🐯 Real-world image super-resolution

python inference_resshift.py -i [image folder/image path] -o [result folder] --task realsr --scale 4 --version v3

🦁 Bicubic (resize by Matlab) image super-resolution

python inference_resshift.py -i [image folder/image path] -o [result folder] --task bicsr --scale 4

🐍 Natural image inpainting

python inference_resshift.py -i [image folder/image path] -o [result folder] --mask_path [mask path] --task inpaint_imagenet --scale 1

🐊 Face image inpainting

python inference_resshift.py -i [image folder/image path] -o [result folder] --mask_path [mask path] --task inpaint_face --scale 1

🐙 Blind Face Restoration

python inference_resshift.py -i [image folder/image path] -o [result folder] --task faceir --scale 1

Training

🐢 Preparing stage

Download the pre-trained VQGAN model from this link and put it in the folder of 'weights'
Adjust the data path in the config file.
Adjust batchsize according your GPUS.
- configs.train.batch: [training batchsize, validation batchsize]
- configs.train.microbatch: total batchsize = microbatch * #GPUS * num_grad_accumulation

🐬 Real-world Image Super-resolution for NeurIPS

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --standalone --nproc_per_node=8 --nnodes=1 main.py --cfg_path configs/realsr_swinunet_realesrgan256.yaml --save_dir [Logging Folder]

🐳 Real-world Image Super-resolution for Journal

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --standalone --nproc_per_node=8 --nnodes=1 main.py --cfg_path configs/realsr_swinunet_realesrgan256_journal.yaml --save_dir [Logging Folder]

🐂 Image inpainting (Natural) for Journal

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --standalone --nproc_per_node=8 --nnodes=1 main.py --cfg_path configs/inpaint_lama256_imagenet.yaml --save_dir [Logging Folder]

🐝 Image inpainting (Face) for Journal

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --standalone --nproc_per_node=8 --nnodes=1 main.py --cfg_path configs/inpaint_lama256_face.yaml --save_dir [Logging Folder]

🐸 Blind face restoration for Journal

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --standalone --nproc_per_node=8 --nnodes=1 main.py --cfg_path configs/faceir_gfpgan512_lpips.yaml --save_dir [Logging Folder]

Reproducing the results in our paper

🚗 Prepare data

Synthetic data for image super-resolution: Link
Real data for image super-resolution: RealSet65 | RealSet80
Synthetic data for natural image inpainting: Link
Synthetic data for face image inpainting: Link
Synthetic data for blind face restoration: Link

🚀 Image super-resolution

Reproduce the results in Table 3 of our NeurIPS paper:

python inference_resshift.py -i [image folder/image path] -o [result folder] --task realsr --scale 4 --version v1 --chop_size 64 --chop_stride 64 --bs 64

Reproduce the results in Table 4 of our NeurIPS paper:

python inference_resshift.py -i [image folder/image path] -o [result folder] --task realsr --scale 4 --version v1 --chop_size 512 --chop_stride 448

Reproduce the results in Table 2 of our Journal paper:

python inference_resshift.py -i [image folder/image path] -o [result folder] --task realsr --scale 4 --version v3 --chop_size 64 --chop_stride 64 --bs 64

Reproduce the results in Table 3 of our Journal paper:

python inference_resshift.py -i [image folder/image path] -o [result folder] --task realsr --scale 4 --version v3 --chop_size 512 --chop_stride 448

Model card:

version-1: Conference paper, 15 diffusion steps, trained with 300k iterations.
version-2: Conference paper, 15 diffusion steps, trained with 500k iterations.
version-3: Journal paper, 4 diffusion steps.

✈️ Image inpainting

Reproduce the results in Table 4 of our Journal paper:

python inference_resshift.py -i [image folder/image path] -o [result folder] --mask_path [mask path] --task inpaint_imagenet --scale 1 --chop_size 256 --chop_stride 256 --bs 32

Reproduce the results in Table 5 of our Journal paper:

python inference_resshift.py -i [image folder/image path] -o [result folder] --mask_path [mask path] --task inpaint_face --scale 1 --chop_size 256 --chop_stride 256 --bs 32

⛵ Blind Face Restoration

Reproduce the results in Table 6 of our Journal paper (arXiv):

python inference_resshift.py -i [image folder/image path] -o [result folder] --task faceir --scale 1 --chop_size 256 --chop_stride 256 --bs 16

License

This project is licensed under NTU S-Lab License 1.0. Redistribution and use should follow this license.

Acknowledgement

This project is based on Improved Diffusion Model, LDM, and BasicSR. We also adopt Real-ESRGAN to synthesize the training data for real-world super-resolution. Thanks for their awesome works.

Contact

If you have any questions, please feel free to contact me via [email protected].

resshift's People

Contributors

Stargazers

Watchers

resshift's Issues

RealSet65 dataset

Hi~

Where can I download the RealSet65 data set?

Questions about training configs

您好，请教您一下训练中diffusion.step的设置，我看您的默认设置是15，请问这个step设置的更大会训练得到更好的结果吗？我目前将diffusion.step设置为了50，同时用step=50来inference，但是我训练的模型推理很难达到您的预训练模型的性能。请问我应该把在训练中将diffusion.step调小一些吗？

Dear author, thanks for your excellent work!
After carefully reading your paper and your code, I see that the training is based on 64-256 SR task. But it seems that you use the pretrained weights (VQGAN encoder/decoder and SwinUnetModel) from 64-512 SR task to apply in 128-512 SR Task. How does it work by applying the same weights for different tasks? Or do I misunderstood your work? Thank you!

Can you kindly provide the pre-trained model?

About inference time

Thank you for the excellent work! I used 'CUDA_VISIBLE_DEVICES=gpu_id python inference_resshift.py -i [image folder/image path] -o [result folder] --scale 4 --task realsrx4 --chop_size 512' to super-resolve ImageNet-Test on a Tesla V100 GPU. And it costs about 55 mins, which is much more than the time presented in the paper. So what's wrong with the inference code? Thank you for your reply ahead of time.

Dataset of Blind Face Restoration

Hi,

Thanks for releasing the code for the journal version.

Could you upload the blind face restoration dataset? I can't find the download link.

wights/autoenconder

请问这个文件是怎么获得啊

The question about the noisy pixels

Thanks for your great work. I find your results is always perceptually better than StableSR, but some output images have noisy pixels as below. I wonder why this happen, and how can I fix or mitigate this defect by adjusting the parameters or re-training the model?

Hi, I would like to ask how to solve the error of single GPU training model？

how can i get this file '/mnt/sfs-common/zsyue/database/ImageNet/files_txt/path_train_all.txt'

Dependencies in environment.yml

I've seen and installed all sorts of projects, including stable diffusion, traiNNer, chaiNNer, and all sorts of Upscaler's stuff.
But in none of them have I seen so many libraries as dependencies.
Have you tried removing unnecessary and redundant libraries?
Or do you need every single library EVER WRITTEN for python to simple run/test ResShift stuff?
it's really a hell of different python libraries of all kinds and also tightly version and OS specific.

Seeking a more detailed explanation about the _scale_input function in the gaissuain_diffusion class

Excellent job, I have learned a lot
I have 3 questions:

Is standardization very novel simply because experiments work better?
I don’t quite understand your comment: the variance of latent code is around 1.0. If implicit representation is not needed, can 1 be removed?
if latent_flag=False, why is 3*\sqrt{\eta}+1? Where did 3 and 1 come from?

Do you have plan to support diffusers inference in the future?

Hi, I would like to ask the training step for other dataset

Thanks for the brilliant work!! For the other datasets, is that true to train the autoencoder first and then train the diffusion model?

你好，关于训练中出现的问题想请教一下

在训练的过程中，由于没有vq gan的权重我直接在pixel space训练diffusion ，在训练到1966步loss就会变为nan，我检查了我自己的数据没有出现，没有异常值，请问你知道这是为什么吗？

Have you tried trained on predicting 'epsilon' but not 'xstart'?

Very awesome work and inspired me a lot!!

I have a question regarding the experiment on training objectives. Have you tried training on reconstructing 'epsilon'? To me, it's not very intuitive why the model needs to output the same 'x_0' at different time steps.

I would appreciate it if you have further insights!

about faceir test results

Hi, thanks for offering such good work.
I'm facing a test problem, the test result is a totally black image. Is there any configuration problem for faceir task?

test dataset

Can you kindly provide the ImageNet-Test dataset?

May I ask the author, can this model improve the resolution of real person images?

May I ask the author, can this model improve the resolution of real person images? I have a bunch of screenshots of real-life videos with poor quality. Can I use this model to achieve better image quality?
（May I ask the author, can this model improve the resolution of real person images? I have a bunch of screenshots of real-life videos with poor quality. Can I use this model to achieve better image quality?）

openxlab demo error

@zsyOAOA
Thank you for sharing your work. I am getting errors when trying the demo on openxlab. is there a specific input size or something?

ResShift without the autoencoder

Thanks for the great work! I have a small query on training the model without the autoencoder. If i directly declare it to be none in the config file, i.e.,

autoencoder: None

it errors out due to the other dependencies on the autoencoder config params in the script. I also tried only making the target none and leaving the rest, but doesn't seem to work. Could you please guide me as to how to train ResShift without the autoencoder, basically on the image space instead of the latent space? Thanks a ton, and I hope to hear from you soon.

Hi，I want to know how to save psnr and lpips to log while training？

There are gt and lq files in my verification set folder. Why are there no psnr and lpips indicators in the log when the verification set is used for verification during the training process?

Discrepancies in CLIPIQA and MUSIQ Scores When Testing ResShift on RealSR65

Hi @zsyOAOA,

I am experiencing inconsistencies in the evaluation metrics while testing ResShift with the RealSR65 dataset. Below is a detailed description of my process and the issues encountered:

Data Verification and Command Execution:
- Confirmed the presence of the dataset in ./testdata/RealSet65.
- Ran the ResShift inference using the following command:
```
CUDA_VISIBLE_DEVICES=0 python inference_resshift.py -i testdata/RealSet65 -o result/RealSet65 --scale 4 --task realsrx4 --chop_size 512
```
Evaluation Metrics Assessment:
- Utilized IQA-PyTorch for computing CLIPIQA and MUSIQ metrics.
- Obtained the following results for the RealSR65 dataset:
```
CLIPIQA: 0.6418642669916153 (expected 0.6537)
MUSIQ: 58.211212921142575 (expected 61.330)
```
- Additionally, I observed these results for another subset of RealSR:
```
CLIPIQA: 0.5409876523911953 (expected 0.5958)
MUSIQ: 53.28555391311645 (expected 59.873)
```
Issue and Inquiry:
- Despite varying the random seed with the --seed option, the scores did not align with the reported values.
- This discrepancy persists across different datasets and metrics, prompting me to question if a step was missed or executed incorrectly.

Questions:

Could there be an oversight in my testing methodology or a specific procedure I should follow?
Is evaluating CLIPIQA and MUSIQ on the Y channel necessary or recommended for accurate results?

I am keen to understand and rectify these discrepancies and would greatly appreciate your insights.

Thank you for your assistance.

How long does it take to train on imagenet?

Thank you for your awesome research.
I'm trying to train with imagenet in your code, and it turns out that it takes at least 14 days to run 50k iterations.
Can you tell me how long it took you to train?
And is there a technique that can expedite training?

gradio版本问题

不知道作者用的是哪个gradio版本，最新的版本报错AttributeError: module 'gradio' has no attribute 'outputs' ，换一个老一点的版本会报错AttributeError: module 'gradio' has no attribute 'Image'

NameError: name 'vqgan_dir' is not defined

$ CUDA_VISIBLE_DEVICES=0 python inference_resshift.py -i input -o output --task realsrx4 --chop_size 512
C:\Python310\lib\site-packages\torchvision\transforms\functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional.
  warnings.warn(
Downloading: "https://github.com/zsyOAOA/ResShift/releases/download/v1.0/resshift_realsrx4_s15.pth" to C:\ResShift-master\weights\resshift_realsrx4_s15.pth

100%|##########| 456M/456M [00:17<00:00, 27.0MB/s]
Traceback (most recent call last):
  File "C:\ResShift-master\inference_resshift.py", line 101, in <module>
    main()
  File "C:\ResShift-master\inference_resshift.py", line 87, in main
    configs, chop_stride, chop_bs = get_configs(args)
  File "C:\ResShift-master\inference_resshift.py", line 56, in get_configs
    model_dir=vqgan_dir,
NameError: name 'vqgan_dir' is not defined

How to fix this? Thanks

Res-Shift weights without VQGAN

Thanks for the good work ! I assessed the quality of VQGAN on my data and it was really poor which caused poor quality as well when I used your model. So I want to not use any autoencoder anymore and was wondering if you have released your model weights without using any autoencoder since in the official paper Figure 2 you said that using an autoencoder is optional. I would really appreciate it, otherwise I would need to train the model from scratch...

issues about training

I have trained about 390k iterations, but got poor results.

How to test ResShift in RealSR x4 dataset?

The LR resolution and GT resolution are the same in the RealSRx4 [1] Dataset.

It doesn't work whenI just set params "--scale" from 4 to 1 in inference.py.

Maybe I should downsample LR first?

Looking forward your reply!

Thank you!

[1]. Jianrui Cai, Hui Zeng, Hongwei Yong, Zisheng Cao, and Lei Zhang. Toward real-world single image super-resolution: A new benchmark and a new model. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3086–3095, 2019.

module 'ldm.models.autoencoder' has no attribute 'VQModelTorch'

模块里明明有VQModelTorch函数却显示找不到，看别人复现stable-diffusion也有相同问题，不知道该怎么解决了。

Test wrong

I want to test the faceir, when i run "python inference_resshift.py -i /home/xielangren/project/ResShift/testdata/eval15/low --task faceir --scale 1",it got the error:

Traceback (most recent call last):
  File "/home/xielangren/project/ResShift/inference_resshift.py", line 197, in <module>
    main()
  File "/home/xielangren/project/ResShift/inference_resshift.py", line 170, in main
    resshift_sampler = ResShiftSampler(
  File "/home/xielangren/project/ResShift/sampler.py", line 53, in __init__
    self.setup_dist()  # setup distributed training: self.num_gpus, self.rank
  File "/home/xielangren/project/ResShift/sampler.py", line 72, in setup_dist
    rank = int(os.environ['LOCAL_RANK'])
  File "/home/xielangren/miniconda3/envs/resshift/lib/python3.10/os.py", line 680, in __getitem__
    raise KeyError(key) from None
KeyError: 'LOCAL_RANK'

Seek help

Hello author!
Can image denoising and enhancement be performed on the following images to enhance their clarity?
Thank you!|

weird pink shade images

hi,

i train on my dataset and get some pink-ish shade in some images, and i don t know the cause of it
did it happen to any of you?
thanks

Reproducing the results on ImageNet-Val

Hi. To evaluate our setup we are trying to reproduce the results mentioned in the paper. To do so.. we have followed the following steps mentioned in readme.

Sample 3k images from the ImageNet-Val set using the script (https://github.com/zsyOAOA/ResShift/blob/master/scripts/prepare_testing_imagenet.py)
Generate reconstruction images with inference script and models shared
CUDA_VISIBLE_DEVICES=gpu_id python inference_resshift.py -i [image folder/image path] -o [result folder] --scale 4 --task realsrx4 --chop_size 512

The PSNR and SSIM from this set aren't matching the numbers reported in papers. Can you confirm the steps and add if we are missing anything?

Google Colab demo

Hello your work is very impressive, the quality of the results is really very high.

Could you please make an online demo in Google Colab?
Thank you.

dear author, can you offer an introduction of how to prepare train data?

In the part of data prepare of the readme, I know how to prepare test dataset, but I'm confused how to prepare train dataset, should I just download image and put it into folder, do not need any preprocess?
can you give me some tips, thank you.

自定义数据集问题

作者您好，感谢您的杰出工作
我试图使用您提出的模型训练自己的数据集
在readme中我注意到只需修改
txt_file_path: [ '/mnt/lustre/zsyue/database/ImageNet/train/image_path_all.txt', '/mnt/lustre/zsyue/database/FFHQ/files_txt/ffhq256.txt', ]

但是这两个txt文件中是直接写入图片的路径吗？，放在一个txt中是如何区分gt和lq的呢？
如果您能帮忙解惑，不胜感激。

Compare with StableSR

Hi, thanks for the great work !

ResShift shows great performance comparing with BSRGGAN, RealESRGAN, SWinIR, LDM... in your paper. Have you ever compared it with StableSR ? The comparison should be very interesting.

CUDA out of memory

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 128.00 GiB (GPU 0; 24.00 GiB total capacity; 6.71 GiB already allocated; 12.96 GiB free; 8.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Only have 1 GPU with 24GB, is there any way to run this?

2x Scale

Hello,
I am trying to convert the model to be able to make 2x scaling instead of 4x.
I am referring to this issue: #22, comment: #22 (comment)

The link provided in this comment is broken, could you please re-send it?

kappa参数设置

为什么论文中写设置kappa为2，但是bicubic_swinunet_bicubic256.yaml中设置是1？

关于batchsize与microbatch size

请问配置文件中的batch和microbatch分别是什么意思呢，batch是总的训练batchsize，microbatch是分配到每个gpu上的batchsize吗？也就是说如果我用8卡训练，batch是64的话，microbatch应该设置为8，请问我的理解是对的吗？Thanks!

Training Datasets Used Question

Dear author,

I hope this message finds you well. While engaging with your work and attempting to replicate the experiments based on the realsr_swinunet_realesrgan256.yaml configuration file provided, I came across a detail regarding the usage of datasets during training. The configuration file lists two dataset paths as follows:

      txt_file_path: [
                      '/mnt/lustre/zsyue/database/ImageNet/train/image_path_all.txt', 
                      '/mnt/lustre/zsyue/database/FFHQ/files_txt/ffhq256.txt',
                     ]

However, in the relevant section of your paper, while ImageNet dataset is mentioned as a resource for training, there isn't an explicit indication that the FFHQ dataset is also included. Consequently, I would like to seek clarification from you regarding which datasets were actually used in the model training and experimental results reported in your paper. Could you please clarify whether the outcomes presented in your research were derived solely from the ImageNet dataset or if they also incorporated data from the FFHQ dataset? Thank you!

what should I do if I want to inference or train without vae

Asking about the inference time

Hi,

Thank you so much for your contributions!

I'd like to ask you about the inference time in the case of using "realsr_swinunet_realesrgan256.yaml"

In particular, it takes me around 6s to handle an image with size of 500x400x3.
My GPU is RTX4090 24GB.

The reason is that, I run exact on the same situation yesterday, but it takes only less than 1s.

Therefore, I'd like to ask if the inference time (6s) is normal, or any settings I need to modify to speed up the inference.

遇见了内存分配问题

已经在运行代码加了内存控制使用0.9，
import torch

设置CUDA占用的GPU内存的百分比

torch.cuda.set_per_process_memory_fraction(0.9) # 这里设置为90%

但是还是报错：
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 128.00 GiB (GPU 0; 23.99 GiB total capacity; 7.85 GiB already allocated; 14.25 GiB free; 21.59 GiB allowed; 8.14 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF