Code Monkey home page Code Monkey logo

uformer's Introduction

Uformer: A General U-Shaped Transformer for Image Restoration (CVPR 2022)

Zhendong Wang, Xiaodong Cun, Jianmin Bao, Wengang Zhou, Jianzhuang Liu, Houqiang Li

PWC PWC PWC PWC PWC PWC PWC PWC

Paper link: [Arxiv] [CVPR]

Update:

  • 2022.07.06 Upload new codes and models for our Uformer.
  • 2022.04.09 Upload results of Uformer on denoising (SIDD, DND), motion deblurring (GoPro, HIDE, RealBlur-J/-R), and defocus deblurring (DPDD).
  • 2022.03.02 Uformer has been accepted by CVPR 2022! 🔥
  • 2021.11.30 Update Uformer in Arxiv link. The new code, models and results will be uploaded.
  • 2021.10.28 Release the results of Uformer32 on SIDD and DND.
  • 2021.09.30 Release pre-trained Uformer16 for SIDD denoising.
  • 2021.08.19 Release a pre-trained model(Uformer32)! Add a script for FLOP/GMAC calculation.
  • 2021.07.29 Add a script for testing the pre-trained model on the arbitrary-resolution images.

In this paper, we present Uformer, an effective and efficient Transformer-based architecture, in which we build a hierarchical encoder-decoder network using the Transformer block for image restoration. Uformer has two core designs to make it suitable for this task. The first key element is a local-enhanced window Transformer block, where we use non-overlapping window-based self-attention to reduce the computational requirement and employ the depth-wise convolution in the feed-forward network to further improve its potential for capturing local context. The second key element is that we explore three skip-connection schemes to effectively deliver information from the encoder to the decoder. Powered by these two designs, Uformer enjoys a high capability for capturing useful dependencies for image restoration. Extensive experiments on several image restoration tasks demonstrate the superiority of Uformer, including image denoising, deraining, deblurring and demoireing. We expect that our work will encourage further research to explore Transformer-based architectures for low-level vision tasks.

Uformer

Package dependencies

The project is built with PyTorch 1.9.0, Python3.7, CUDA11.1. For package dependencies, you can install them by:

pip install -r requirements.txt

Pretrained model

Results from the pretrained model

Data preparation

Denoising

For training data of SIDD, you can download the SIDD-Medium dataset from the official url. Then generate training patches for training by:

python3 generate_patches_SIDD.py --src_dir ../SIDD_Medium_Srgb/Data --tar_dir ../datasets/denoising/sidd/train

For evaluation on SIDD and DND, you can download data from here.

Deblurring

For training on GoPro, and evaluation on GoPro, HIDE, RealBlur-J and RealBlur-R, you can download data from here.

Then put all the denoising data into ../datasets/denoising, and all the deblurring data into ../datasets/deblurring.

Training

Denoising

To train Uformer on SIDD, you can begin the training by:

sh script/train_denoise.sh

Deblurring

To train Uformer on GoPro, you can begin the training by:

sh script/train_motiondeblur.sh

Evaluation

To evaluate Uformer, you can run:

sh script/test.sh

For evaluate on each dataset, you should uncomment corresponding line.

Computational Cost

We provide a simple script to calculate the flops by ourselves, a simple script has been added in model.py. You can change the configuration and run:

python3 model.py

The manual calculation of GMacs in this repo differs slightly from the main paper, but they do not influence the conclusion. We will correct the paper later.

Citation

If you find this project useful in your research, please consider citing:

@InProceedings{Wang_2022_CVPR,
    author    = {Wang, Zhendong and Cun, Xiaodong and Bao, Jianmin and Zhou, Wengang and Liu, Jianzhuang and Li, Houqiang},
    title     = {Uformer: A General U-Shaped Transformer for Image Restoration},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {17683-17693}
}

Acknowledgement

This code borrows heavily from MIRNet and SwinTransformer.

Contact

Please contact us if there is any question or suggestion(Zhendong Wang [email protected], Xiaodong Cun [email protected]).

uformer's People

Contributors

vinthony avatar zhendongwang6 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

uformer's Issues

mask in test_in_any_resolution.py

Maybe you can padding the image first, otherwise invalid information will be fused in the large scale token:

def expand2square(timg, factor=128):
	# padding first
	_, _, h, w = timg.size()
	mod_pad_h = (factor - h % factor) % factor
	mod_pad_w = (factor - w % factor) % factor
	timg = F.pad(timg, (0, mod_pad_w, 0, mod_pad_h), 'reflect')

	_, _, h, w = timg.size()
	X = int(math.ceil(max(h,w)/float(factor))*factor)

	img = torch.zeros(1,3,X,X).type_as(timg) 	# 3,h,w
	mask = torch.ones(1,1,X,X).type_as(timg)	# for -inf

	img[:, :, :h,:w] = timg
	mask[:, :, :h,:w].fill_(0.0)
	
	return img, mask

Code release

This work is interesting, when will the code release?

图像去模糊

感谢您在图像恢复领域出色的工作,您提出的Uformer对于后来者的工作具有非常深远的借鉴意义。
关于图像去模糊的code您还没有公布,请问您能公开图像去模糊的code吗?
再次感谢!

Update about License ?

Sir/madam, your model is doing great and results are crisp, can we use it for commercial purpose

When i use test the model in any resolution of images, i meet the following problem

This is my testing code:

import torch
import math
from model import Uformer

def expand2square(timg, factor=16.0):
_, _, h, w = timg.size()

X = int(math.ceil(max(h, w) / float(factor)) * factor)

img = torch.zeros(1, 3, X, X).type_as(timg)  # 3, h,w
mask = torch.zeros(1, 1, X, X).type_as(timg)

# print(img.size(),mask.size())
# print((X - h)//2, (X - h)//2+h, (X - w)//2, (X - w)//2+w)
img[:, :, ((X - h) // 2):((X - h) // 2 + h), ((X - w) // 2):((X - w) // 2 + w)] = timg
mask[:, :, ((X - h) // 2):((X - h) // 2 + h), ((X - w) // 2):((X - w) // 2 + w)].fill_(1.0)

return img, mask

x = torch.randn(1, 3, 128, 720)

rgb_noisy, mask = expand2square(x, factor=128)

print(rgb_noisy.size())
print(mask.size())
out = model(rgb_noisy, (1 - mask))
print(out.size())

I meet the following problem:

torch.Size([1, 3, 768, 768])
torch.Size([1, 1, 768, 768])
Traceback (most recent call last):
File "/Users/Uformer-main/my_test.py", line 69, in
out = model(rgb_noisy, (1 - mask))
File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/Users/Uformer-main/model.py", line 1336, in forward
conv0 = self.encoderlayer_0(y,mask=mask)
File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/Users/Uformer-main/model.py", line 1044, in forward
x = blk(x,mask)
File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/Users/Uformer-main/model.py", line 695, in forward
attn_mask = attn_mask or shift_attn_mask
RuntimeError: bool value of Tensor with more than one value is ambiguous

modulators

hello,Where are Modulators in your code?is it qk_bias?

why dataset is small but performence is sota?

hi, thanks for your meaningful work
previous work about Transformer in vision hava a common opinion
which transformer needs huge dataset to feed if you want its performence great

in this work, you just train the network in SIDD patches, which nearly about 9w patches , but other works train their Transformer in nealy 100w.

so, can you explain this reason?
or can i say Transformer actually does not need too many data to feed?

Support arbitrary input resolution?

Hi your work is very inspiring!

I didn't find in your paper on how you apply Uformer during inference. For example, on SIDD, the training patches are 128x128, and evaluation patches are 256x256. Were you directly applying your network on the whole 256x256 patch, or in a sliding window form? In other words, does Uformer supports arbitrary input resolution?

windowed attention

Hello.

I looked at the code and read the paper, and windowed attention(Window-based Multi-head Self-Attention (W-MSA)) was first proposed in "image restoration using swin transformer."

However, you did not mention the exact difference in your paper.

What is difference?

how to test a image which resolution is not (256,256)?

hi there, thanks for your job for offerring a script "test_in_any_resolution.py"
but, in this script, a image with random size has been processed in expand2square function,
but such size cant feed into the Uformer model.

so i wonder if this network cant process the size isn't (256,256)?
if i want to denoise the image with random size, i have to resize the size of image to (256,256)?
thanks!

About the value of 'train_ps' in training mode

Thank you for your work. If the input size of the network is 256 x 256, should I increase one encoder and decoder layer to keep the bottleneck layer's feature size is 8 (equal to window-size)? Or keep the layer num and the bottleneck's is 16, which one performs better, please? The former will bring more computation cost. In another way, is it a trick that the feature size in bottleneck is equal to the window size, please?

Problems when submitting result to SIDD benchmark

Thanks for sharing your great work! I meet some problems when submitting my result to SIDD benchmark. It told me that my submission file (960 MB) was over the limitation (900 MB). I used the SIDD Benchmark Data (Noisy sRGB data)
image.
To generate .mat files, I used python to change image into .mat (the same way as MPRNet)...

Computitional complexity

hi there! Thanks for your wonderful work!
This is amazing to train Transformer in just SIDD(not a huge dataset)
But when i read paper, I notice that Window-based self-attention can ruduce computation al cost.

if feature map X(C, W, H)
global self-attention cost : O(H2W2C)
Window-based self-attention cost : O(HW/M2M4C) = O(M2HWC)

i have no idea why global is O(H2W2C) and why window-based is O(HW/M2M4C)
could you please explain detials of this fomulation?
Expect your reply!

When will the updated model codes be released?

Hi, @ZhendongWang6 Thanks a lot for your wonderful work in Uformer. I have noticed that the new version of Uformer have updated in arxiv about one month ago. The modulator seems to improve the performance a lot. So I wonder when will the new version of code be released?

Again, thanks a lot for your excellent job :)

Which SIDD data do you use exactly?

Hi, you use SIDD-Medium sRGB data, right? but there are mirror 1 and mirror 2, so which mirror do you use? and what's the difference of mirror 1 and 2?

关于多尺度调制

您好,我看到您的arxiv论文上是有多尺度调制的,为什么这个代码好像没有这个模块了,而且这个代码里附的图也有比较大的变化。希望能得到您的回答,非常感谢!

Trying to reproduce Uformer_B model for deblurring

I trained the model you provided on the GoPro dataset.
I trained 3000 Epoch with CharbonnierLoss and Edge loss as MPRNet code and I also use same scheduler as MPRNet code

but I got the PSNR 31.1db on GoPro test dataset

Is this happens because of the absence of modulator?
or is there any other problem?

Non-square input

Hi your code curently dont support non-square entry, even if I change img_size for img_size_H and img_size_W, there is still an error because the input is not a square. This is a very big limitation of your code right now..

Kind regards,
Sébastien de Blois

TypeError: forward() takes 2 positional arguments but 3 were given

test_in_any_resolution.py has an error and It is the full stack of error message.

 `Traceback (most recent call last):
  File "/content//Uformer/test_in_any_resolution.py", line 104, in <module>
    rgb_restored = model_restoration(rgb_noisy, 1 - mask)
  
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/data_parallel.py", line 159, in forward
    return self.module(*inputs[0], **kwargs[0])
  
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)

TypeError: forward() takes 2 positional arguments but 3 were given`

Actually, I was trying with my custom data but Its shape is the same as sidd data so I can't understand why.

Deblur pretrain model weight?

Hi,

Thanks for your wonderful work of Uformer!
I'm wondering is there a plan to release the pretrain weight train on GOPRO?
Thank you!

image block effect

When I trained on other datasets, the block effect occurred in the results. Have you ever been in a similar situation? How did you solve it?
Looking forward to your reply. Thank you
微信图片_20211104150339

How do you compute Flops of uformer16 and 32?

I used the same package as your code. (from ptflops import get_model_complexity_info)

And got 2.51 GMac, 5.15 M for Uformer16, 9.98 GMac 20.47 M for Uformer32.

It seems that I need to define some ops in this model. Can you provide a solution or relative code on computing Flops and Params?

Thanks!

关于测试阶段如何输出大尺寸完整的图像的?

作者您好,我看代码,是将数据都划分成patches再放到模型中,测试也是一样,但是这样的话该如何得到完整的图像呢?
我尝试过将各个patches拼接起来,但是这样的话会有明显的划分痕迹,所以想请教一下作者论文中的结果如何得到的呢?

about epochs in training time

hi friend:
Uformer is interesting. Paper reports that you train Uformer_16 for 250 epochs with batch size 32 to get 39.66 PNSR in SIDD. So how many iters in training phase?What is PSNR when 40 epochs are trained? I just want to reproduce this result in a short time

dehazing

do you think retrain your model on dehazing work andk it will be useful?

Some problems on flops calculation

In model.py at 475 line(flops of LeFF),

    def flops(self, H, W):
        flops = 0
        # fc1
        flops += H*W*self.dim*self.hidden_dim 
        # dwconv
        flops += H*W*self.hidden_dim*3*3
        # fc2
        flops += H*W*self.hidden_dim*self.dim
        print("LeFF:{%.2f}"%(flops/1e9))
        return flops

flops of dwconv is calculated by
flops += H*W*self.hidden_dim*3*3
would is it like to be
flops += H*W*self.hidden_dim*self.hidden_dim*3*3
as the flops of conv cal as c_ink_hk_wc_outH*W ?

training for deblur

Hi,

How can we train the uformer for deblurring tasks?

I mean train the uformer on GoPro dataset

Thanks

using pertrained weight, but raise a RuntimeError

hello! thanks for your devotion.
I train the Uformer using the SIDD on 2 V100 as you suggestion.
i trained nearly 69 epoches and stoped it. i got a weight file.
i valid it and it perform well.
but when i want to fintune on 2 V100, i add --resume command line.

in train.py line 169: loss_scaler(loss, optimizer, parameters=model_restoration.parameters())
i raise a RuntimeError:Expeted all tensors to be on the same device, but found at least two devices, cude:0 and cpu

i dont know how to solve this matter, have you met this problem?
thx!

Rain removal

Hi, thanks for public your awesome works,
Currently, the provided denoising model is very nice, But it does not work well with rain images. May I ask if you will public other checkpoints of like remove rain, deblur? Or please give some suggestions to train them using your model. Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.