Code Monkey home page Code Monkey logo

resshift's Introduction

ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting (NeurIPS 2023, Spotlight)

Zongsheng Yue, Jianyi Wang, Chen Change Loy

Conference Paper | Journal Paper | Project Page | Video

google colab logo Replicate OpenXLab visitors

⭐ If ResShift is helpful to your images or projects, please help star this repo. Thanks! 🤗


Diffusion-based image super-resolution (SR) methods are mainly limited by the low inference speed due to the requirements of hundreds or even thousands of sampling steps. Existing acceleration sampling techniques inevitably sacrifice performance to some extent, leading to over-blurry SR results. To address this issue, we propose a novel and efficient diffusion model for SR that significantly reduces the number of diffusion steps, thereby eliminating the need for post-acceleration during inference and its associated performance deterioration. Our method constructs a Markov chain that transfers between the high-resolution image and the low-resolution image by shifting the residual between them, substantially improving the transition efficiency. Additionally, an elaborate noise schedule is developed to flexibly control the shifting speed and the noise strength during the diffusion process. Extensive experiments demonstrate that the proposed method obtains superior or at least comparable performance to current state-of-the-art methods on both synthetic and real-world datasets, even only with 15 sampling steps.


Update

  • 2024.03.11: Update the code for the Journal paper
  • 2023.12.02: Add configurations for the x2 super-resolution task.
  • 2023.08.15: Add OpenXLab.
  • 2023.08.15: Add Gradio Demo.
  • 2023.08.14: Add bicubic (matlab resize) model.
  • 2023.08.14: Add Project Page.
  • 2023.08.02: Add Replicate demo Replicate.
  • 2023.07.31: Add Colab demo google colab logo.
  • 2023.07.24: Create this repo.

Requirements

  • Python 3.10, Pytorch 2.1.2, xformers 0.0.23
  • More detail (See environment.yml) A suitable conda environment named resshift can be created and activated with:
conda create -n resshift python=3.10
conda activate resshift
pip install -r requirements.txt

or

conda env create -f environment.yml
conda activate resshift

Applications

👉 Real-world image super-resolution

👉 Image inpainting

👉 Blind Face Restoration

Online Demo

You can try our method through an online demo:

python app.py

Fast Testing

🐯 Real-world image super-resolution

python inference_resshift.py -i [image folder/image path] -o [result folder] --task realsr --scale 4 --version v3

🦁 Bicubic (resize by Matlab) image super-resolution

python inference_resshift.py -i [image folder/image path] -o [result folder] --task bicsr --scale 4

🐍 Natural image inpainting

python inference_resshift.py -i [image folder/image path] -o [result folder] --mask_path [mask path] --task inpaint_imagenet --scale 1

🐊 Face image inpainting

python inference_resshift.py -i [image folder/image path] -o [result folder] --mask_path [mask path] --task inpaint_face --scale 1

🐙 Blind Face Restoration

python inference_resshift.py -i [image folder/image path] -o [result folder] --task faceir --scale 1

Training

🐢 Preparing stage

  1. Download the pre-trained VQGAN model from this link and put it in the folder of 'weights'
  2. Adjust the data path in the config file.
  3. Adjust batchsize according your GPUS.
    • configs.train.batch: [training batchsize, validation batchsize]
    • configs.train.microbatch: total batchsize = microbatch * #GPUS * num_grad_accumulation

🐬 Real-world Image Super-resolution for NeurIPS

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --standalone --nproc_per_node=8 --nnodes=1 main.py --cfg_path configs/realsr_swinunet_realesrgan256.yaml --save_dir [Logging Folder] 

🐳 Real-world Image Super-resolution for Journal

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --standalone --nproc_per_node=8 --nnodes=1 main.py --cfg_path configs/realsr_swinunet_realesrgan256_journal.yaml --save_dir [Logging Folder] 

🐂 Image inpainting (Natural) for Journal

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --standalone --nproc_per_node=8 --nnodes=1 main.py --cfg_path configs/inpaint_lama256_imagenet.yaml --save_dir [Logging Folder] 

🐝 Image inpainting (Face) for Journal

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --standalone --nproc_per_node=8 --nnodes=1 main.py --cfg_path configs/inpaint_lama256_face.yaml --save_dir [Logging Folder] 

🐸 Blind face restoration for Journal

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --standalone --nproc_per_node=8 --nnodes=1 main.py --cfg_path configs/faceir_gfpgan512_lpips.yaml --save_dir [Logging Folder] 

Reproducing the results in our paper

🚗 Prepare data

  • Synthetic data for image super-resolution: Link

  • Real data for image super-resolution: RealSet65 | RealSet80

  • Synthetic data for natural image inpainting: Link

  • Synthetic data for face image inpainting: Link

  • Synthetic data for blind face restoration: Link

🚀 Image super-resolution

Reproduce the results in Table 3 of our NeurIPS paper:

python inference_resshift.py -i [image folder/image path] -o [result folder] --task realsr --scale 4 --version v1 --chop_size 64 --chop_stride 64 --bs 64

Reproduce the results in Table 4 of our NeurIPS paper:

python inference_resshift.py -i [image folder/image path] -o [result folder] --task realsr --scale 4 --version v1 --chop_size 512 --chop_stride 448

Reproduce the results in Table 2 of our Journal paper:

python inference_resshift.py -i [image folder/image path] -o [result folder] --task realsr --scale 4 --version v3 --chop_size 64 --chop_stride 64 --bs 64

Reproduce the results in Table 3 of our Journal paper:

python inference_resshift.py -i [image folder/image path] -o [result folder] --task realsr --scale 4 --version v3 --chop_size 512 --chop_stride 448
Model card:
  • version-1: Conference paper, 15 diffusion steps, trained with 300k iterations.
  • version-2: Conference paper, 15 diffusion steps, trained with 500k iterations.
  • version-3: Journal paper, 4 diffusion steps.

✈️ Image inpainting

Reproduce the results in Table 4 of our Journal paper:

python inference_resshift.py -i [image folder/image path] -o [result folder] --mask_path [mask path] --task inpaint_imagenet --scale 1 --chop_size 256 --chop_stride 256 --bs 32

Reproduce the results in Table 5 of our Journal paper:

python inference_resshift.py -i [image folder/image path] -o [result folder] --mask_path [mask path] --task inpaint_face --scale 1 --chop_size 256 --chop_stride 256 --bs 32

⛵ Blind Face Restoration

Reproduce the results in Table 6 of our Journal paper (arXiv):

python inference_resshift.py -i [image folder/image path] -o [result folder] --task faceir --scale 1 --chop_size 256 --chop_stride 256 --bs 16

License

This project is licensed under NTU S-Lab License 1.0. Redistribution and use should follow this license.

Acknowledgement

This project is based on Improved Diffusion Model, LDM, and BasicSR. We also adopt Real-ESRGAN to synthesize the training data for real-world super-resolution. Thanks for their awesome works.

Contact

If you have any questions, please feel free to contact me via [email protected].

resshift's People

Contributors

chenxwh avatar diveshjain-phy avatar eltociear avatar wyf0912 avatar zsyoaoa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

resshift's Issues

Questions about training configs

您好,请教您一下训练中diffusion.step的设置,我看您的默认设置是15,请问这个step设置的更大会训练得到更好的结果吗?我目前将diffusion.step设置为了50,同时用step=50来inference,但是我训练的模型推理很难达到您的预训练模型的性能。请问我应该把在训练中将diffusion.step调小一些吗?

Issues about Image Size

Dear author, thanks for your excellent work!
After carefully reading your paper and your code, I see that the training is based on 64-256 SR task. But it seems that you use the pretrained weights (VQGAN encoder/decoder and SwinUnetModel) from 64-512 SR task to apply in 128-512 SR Task. How does it work by applying the same weights for different tasks? Or do I misunderstood your work? Thank you!

About inference time

Thank you for the excellent work! I used 'CUDA_VISIBLE_DEVICES=gpu_id python inference_resshift.py -i [image folder/image path] -o [result folder] --scale 4 --task realsrx4 --chop_size 512' to super-resolve ImageNet-Test on a Tesla V100 GPU. And it costs about 55 mins, which is much more than the time presented in the paper. So what's wrong with the inference code? Thank you for your reply ahead of time.

Dataset of Blind Face Restoration

Hi,

Thanks for releasing the code for the journal version.

Could you upload the blind face restoration dataset? I can't find the download link.

The question about the noisy pixels

Thanks for your great work. I find your results is always perceptually better than StableSR, but some output images have noisy pixels as below. I wonder why this happen, and how can I fix or mitigate this defect by adjusting the parameters or re-training the model?
image
image

Dependencies in environment.yml

I've seen and installed all sorts of projects, including stable diffusion, traiNNer, chaiNNer, and all sorts of Upscaler's stuff.
But in none of them have I seen so many libraries as dependencies.
Have you tried removing unnecessary and redundant libraries?
Or do you need every single library EVER WRITTEN for python to simple run/test ResShift stuff?
it's really a hell of different python libraries of all kinds and also tightly version and OS specific.

你好,关于训练中出现的问题想请教一下

在训练的过程中,由于没有vq gan的权重我直接在pixel space训练diffusion ,在训练到1966步loss就会变为nan,我检查了我自己的数据没有出现,没有异常值,请问你知道这是为什么吗?

Have you tried trained on predicting 'epsilon' but not 'xstart'?

Very awesome work and inspired me a lot!!

I have a question regarding the experiment on training objectives. Have you tried training on reconstructing 'epsilon'? To me, it's not very intuitive why the model needs to output the same 'x_0' at different time steps.

I would appreciate it if you have further insights!

about faceir test results

Hi, thanks for offering such good work.
I'm facing a test problem, the test result is a totally black image. Is there any configuration problem for faceir task?

test dataset

Can you kindly provide the ImageNet-Test dataset?

May I ask the author, can this model improve the resolution of real person images?

May I ask the author, can this model improve the resolution of real person images? I have a bunch of screenshots of real-life videos with poor quality. Can I use this model to achieve better image quality?
(May I ask the author, can this model improve the resolution of real person images? I have a bunch of screenshots of real-life videos with poor quality. Can I use this model to achieve better image quality?)

openxlab demo error

@zsyOAOA
Thank you for sharing your work. I am getting errors when trying the demo on openxlab. is there a specific input size or something?

ResShift without the autoencoder

Hi

Thanks for the great work! I have a small query on training the model without the autoencoder. If i directly declare it to be none in the config file, i.e.,

autoencoder: None

it errors out due to the other dependencies on the autoencoder config params in the script. I also tried only making the target none and leaving the rest, but doesn't seem to work. Could you please guide me as to how to train ResShift without the autoencoder, basically on the image space instead of the latent space? Thanks a ton, and I hope to hear from you soon.

Discrepancies in CLIPIQA and MUSIQ Scores When Testing ResShift on RealSR65

Hi @zsyOAOA,

I am experiencing inconsistencies in the evaluation metrics while testing ResShift with the RealSR65 dataset. Below is a detailed description of my process and the issues encountered:

  1. Data Verification and Command Execution:

    • Confirmed the presence of the dataset in ./testdata/RealSet65.
    • Ran the ResShift inference using the following command:
      CUDA_VISIBLE_DEVICES=0 python inference_resshift.py -i testdata/RealSet65 -o result/RealSet65 --scale 4 --task realsrx4 --chop_size 512
  2. Evaluation Metrics Assessment:

    • Utilized IQA-PyTorch for computing CLIPIQA and MUSIQ metrics.
    • Obtained the following results for the RealSR65 dataset:
      CLIPIQA: 0.6418642669916153 (expected 0.6537)
      MUSIQ: 58.211212921142575 (expected 61.330)
    • Additionally, I observed these results for another subset of RealSR:
      CLIPIQA: 0.5409876523911953 (expected 0.5958)
      MUSIQ: 53.28555391311645 (expected 59.873)
  3. Issue and Inquiry:

    • Despite varying the random seed with the --seed option, the scores did not align with the reported values.
    • This discrepancy persists across different datasets and metrics, prompting me to question if a step was missed or executed incorrectly.

Questions:

  • Could there be an oversight in my testing methodology or a specific procedure I should follow?
  • Is evaluating CLIPIQA and MUSIQ on the Y channel necessary or recommended for accurate results?

I am keen to understand and rectify these discrepancies and would greatly appreciate your insights.

Thank you for your assistance.

How long does it take to train on imagenet?

Thank you for your awesome research.
I'm trying to train with imagenet in your code, and it turns out that it takes at least 14 days to run 50k iterations.
Can you tell me how long it took you to train?
And is there a technique that can expedite training?

gradio版本问题

不知道作者用的是哪个gradio版本,最新的版本报错AttributeError: module 'gradio' has no attribute 'outputs' , 换一个老一点的版本会报错AttributeError: module 'gradio' has no attribute 'Image'

NameError: name 'vqgan_dir' is not defined

$ CUDA_VISIBLE_DEVICES=0 python inference_resshift.py -i input -o output --task realsrx4 --chop_size 512
C:\Python310\lib\site-packages\torchvision\transforms\functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional.
  warnings.warn(
Downloading: "https://github.com/zsyOAOA/ResShift/releases/download/v1.0/resshift_realsrx4_s15.pth" to C:\ResShift-master\weights\resshift_realsrx4_s15.pth

100%|##########| 456M/456M [00:17<00:00, 27.0MB/s]
Traceback (most recent call last):
  File "C:\ResShift-master\inference_resshift.py", line 101, in <module>
    main()
  File "C:\ResShift-master\inference_resshift.py", line 87, in main
    configs, chop_stride, chop_bs = get_configs(args)
  File "C:\ResShift-master\inference_resshift.py", line 56, in get_configs
    model_dir=vqgan_dir,
NameError: name 'vqgan_dir' is not defined

How to fix this? Thanks

Res-Shift weights without VQGAN

Thanks for the good work ! I assessed the quality of VQGAN on my data and it was really poor which caused poor quality as well when I used your model. So I want to not use any autoencoder anymore and was wondering if you have released your model weights without using any autoencoder since in the official paper Figure 2 you said that using an autoencoder is optional. I would really appreciate it, otherwise I would need to train the model from scratch...

How to test ResShift in RealSR x4 dataset?

The LR resolution and GT resolution are the same in the RealSRx4 [1] Dataset.

It doesn't work whenI just set params "--scale" from 4 to 1 in inference.py.

Maybe I should downsample LR first?

Looking forward your reply!

Thank you!

[1]. Jianrui Cai, Hui Zeng, Hongwei Yong, Zisheng Cao, and Lei Zhang. Toward real-world single image super-resolution: A new benchmark and a new model. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3086–3095, 2019.

Test wrong

I want to test the faceir, when i run "python inference_resshift.py -i /home/xielangren/project/ResShift/testdata/eval15/low --task faceir --scale 1",it got the error:

Traceback (most recent call last):
  File "/home/xielangren/project/ResShift/inference_resshift.py", line 197, in <module>
    main()
  File "/home/xielangren/project/ResShift/inference_resshift.py", line 170, in main
    resshift_sampler = ResShiftSampler(
  File "/home/xielangren/project/ResShift/sampler.py", line 53, in __init__
    self.setup_dist()  # setup distributed training: self.num_gpus, self.rank
  File "/home/xielangren/project/ResShift/sampler.py", line 72, in setup_dist
    rank = int(os.environ['LOCAL_RANK'])
  File "/home/xielangren/miniconda3/envs/resshift/lib/python3.10/os.py", line 680, in __getitem__
    raise KeyError(key) from None
KeyError: 'LOCAL_RANK'

Seek help

Hello author!
Can image denoising and enhancement be performed on the following images to enhance their clarity?
Thank you!|
image

weird pink shade images

hi,

i train on my dataset and get some pink-ish shade in some images, and i don t know the cause of it
did it happen to any of you?
thanks

Reproducing the results on ImageNet-Val

Hi. To evaluate our setup we are trying to reproduce the results mentioned in the paper. To do so.. we have followed the following steps mentioned in readme.

  1. Sample 3k images from the ImageNet-Val set using the script (https://github.com/zsyOAOA/ResShift/blob/master/scripts/prepare_testing_imagenet.py)
  2. Generate reconstruction images with inference script and models shared
    CUDA_VISIBLE_DEVICES=gpu_id python inference_resshift.py -i [image folder/image path] -o [result folder] --scale 4 --task realsrx4 --chop_size 512

The PSNR and SSIM from this set aren't matching the numbers reported in papers. Can you confirm the steps and add if we are missing anything?

Google Colab demo

Hello your work is very impressive, the quality of the results is really very high.

Could you please make an online demo in Google Colab?
Thank you.

自定义数据集问题

作者您好,感谢您的杰出工作
我试图使用您提出的模型训练自己的数据集
在readme中我注意到只需修改
txt_file_path: [ '/mnt/lustre/zsyue/database/ImageNet/train/image_path_all.txt', '/mnt/lustre/zsyue/database/FFHQ/files_txt/ffhq256.txt', ]

但是这两个txt文件中是直接写入图片的路径吗?,放在一个txt中是如何区分gt和lq的呢?
如果您能帮忙解惑,不胜感激。

Compare with StableSR

Hi, thanks for the great work !

ResShift shows great performance comparing with BSRGGAN, RealESRGAN, SWinIR, LDM... in your paper. Have you ever compared it with StableSR ? The comparison should be very interesting.

CUDA out of memory

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 128.00 GiB (GPU 0; 24.00 GiB total capacity; 6.71 GiB already allocated; 12.96 GiB free; 8.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Only have 1 GPU with 24GB, is there any way to run this?

2x Scale

Hello,
I am trying to convert the model to be able to make 2x scaling instead of 4x.
I am referring to this issue: #22, comment: #22 (comment)

The link provided in this comment is broken, could you please re-send it?

关于batchsize与microbatch size

请问配置文件中的batch和microbatch分别是什么意思呢,batch是总的训练batchsize,microbatch是分配到每个gpu上的batchsize吗?也就是说如果我用8卡训练,batch是64的话,microbatch应该设置为8,请问我的理解是对的吗?Thanks!

Training Datasets Used Question

Dear author,

I hope this message finds you well. While engaging with your work and attempting to replicate the experiments based on the realsr_swinunet_realesrgan256.yaml configuration file provided, I came across a detail regarding the usage of datasets during training. The configuration file lists two dataset paths as follows:

      txt_file_path: [
                      '/mnt/lustre/zsyue/database/ImageNet/train/image_path_all.txt', 
                      '/mnt/lustre/zsyue/database/FFHQ/files_txt/ffhq256.txt',
                     ] 

However, in the relevant section of your paper, while ImageNet dataset is mentioned as a resource for training, there isn't an explicit indication that the FFHQ dataset is also included. Consequently, I would like to seek clarification from you regarding which datasets were actually used in the model training and experimental results reported in your paper. Could you please clarify whether the outcomes presented in your research were derived solely from the ImageNet dataset or if they also incorporated data from the FFHQ dataset? Thank you!

Asking about the inference time

Hi,

Thank you so much for your contributions!

I'd like to ask you about the inference time in the case of using "realsr_swinunet_realesrgan256.yaml"

In particular, it takes me around 6s to handle an image with size of 500x400x3.
My GPU is RTX4090 24GB.

The reason is that, I run exact on the same situation yesterday, but it takes only less than 1s.

Therefore, I'd like to ask if the inference time (6s) is normal, or any settings I need to modify to speed up the inference.

遇见了内存分配问题

已经在运行代码加了内存控制使用0.9,
import torch

设置CUDA占用的GPU内存的百分比

torch.cuda.set_per_process_memory_fraction(0.9) # 这里设置为90%

但是还是报错:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 128.00 GiB (GPU 0; 23.99 GiB total capacity; 7.85 GiB already allocated; 14.25 GiB free; 21.59 GiB allowed; 8.14 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

train error

hello, i am not able to start training and get this error
could you please help me?
greetings
Screenshot from 2023-08-21 15-39-42

VRAM issue

Hello, can a GPU with 24GB of VRAM train this model?

Training sf:1 (debluring)

Thank you for providing your code. I already tested the super resolution, and it great. Is it possible to adopt the config to do debluring on lq:256 x gt:256 so without any super resolution? What would I have to change.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.