jingyunliang / swinir Goto Github PK

SwinIR: Image Restoration Using Swin Transformer (official repository)

Home Page: https://arxiv.org/abs/2108.10257

License: Apache License 2.0

Python 97.87% Shell 2.13%

image-super-resolution image-denoising compression-artifact-reduction image-deblocking transformer real-world-image-super-resolution lightweight-image-super-resolution image-restoration low-level-vision vision-transformer

swinir's Introduction

Jingyun Liang

Email / Homepage / Google Scholar / Github

I am currently a PhD Student at Computer Vision Lab, ETH Zürich, Switzerland. I am co-supervised by Prof. Luc Van Gool and Prof. Radu Timofte. I also work closely with Dr. Kai Zhang. I mainly focus on low-level vision research, especially on image and video restoration, such as

image/video super-resolution (SR)
image/video deblurring
image/video denoising
...

🚀 News

2022-10-04: Our new paper RVRT, NeurlPS2022 achieves SOTA video restoration results with balanced size, memory and runtime.
2022-08-30: See our papers on real-world image denoising (SCUNet) and video denoising (ReViD).
2022-07-30: Three papers, including EFNet (event-based image deblurring, oral), DATSR (reference image SR) and DAVSR (video SR), accepted by ECCV2022.
2022-01-28: Our new paper VRT outperforms previous Video SR/ deblurring/ denoising/ frame interpolation/ space-time video SR methods by up to 😍 2.16dB. 😍
2021-10-20: SwinIR is awarded the best paper prize in ICCV-AIM2021.
2021-08-01: Three papers (HCFlow, MANet and BSRGAN) accepted by ICCV2021.
2021-03-29: One paper (FKP) accepted by CVPR2021.

🌱 Repositories

Topic	Title	Badge
real-world video denoising	Practical Real Video Denoising with Realistic Degradation Model
event-based image deblurring	Event-based Fusion for Motion Deblurring with Cross-modal Attention, ECCV2022
reference image SR	Reference-based Image Super-Resolution with Deformable Attention Transformer, ECCV2022
interpretable video restoration	Towards Interpretable Video Super-Resolution via Alternating Optimization, ECCV2022
transformer-based video restoration	Recurrent Video Restoration Transformer with Guided Deformable Attention
transformer-based video restoration	VRT: A Video Restoration Transformer
transformer-based image restoration	SwinIR: Image Restoration Using Swin Transformer
real-world image denoising	Practical Blind Denoising via Swin-Conv-UNet and Data Synthesis
real-world image SR	Designing a Practical Degradation Model for Deep Blind Image Super-Resolution, ICCV2021
blind image SR	Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution, ICCV2021
blind image SR	Flow-based Kernel Prior with Application to Blind Super-Resolution, CVPR2021
normalizing flow-based image SR and image rescaling	Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling, ICCV2021
image/ video restoration	Image/ Video Restoration Toolbox

swinir's People

Contributors

Stargazers

Watchers

Forkers

farathoverfr a-biao96 ahuirecome lihaossu happy20200 hevincent hust-lidelong liyouxing ast-363 rushi-the-neural-arch crazytiy anoo555 shaunstanislauslau hadryan timerobin jiaxi-jiang 781303842 esunvoteb xkyi guyiyifeurach wwlcape helloworldcn jacklikesironman zt706 as85207 yafengge guoleisun wangdanxu frausong rookielike heavyflavor nooice98 metavai bbpatil shuguoj ak391 zeo95 weigq elessar117 jjjjjjamesharden lord-aresyzen hanlinwu hantingchen wonlee2019 rexhaif cvsch nekodaisiki aiaini66 cv-ip ustcjwyang xsz777 jayagami maoshifu-yang mingfanzhao pq569375378 ymzlygw lockejiang swewjk qqq-tech tfqkr nevolver faisalshahbaz martincastellano powerscans metaprojeto some-alien-bullshit ethan-jiang-1 leomauro 7thstorm priestd09 finder2018 kamal811 nitin-mane hy-chao chancat87 mu-l liulangxing mugerwaeric bluejayblues techthiyanes gavinljj mtsygankov b4go3s vhurryharry wdmwhh kenan82 slbuilder qiaoptdun ltbig hzk7287 ra4-z fangichao mobulan liufeng5200 rococostudio ikasumi star-twinking ly451x joeyee nicolopinci

swinir's Issues

Trained model from KAIR (40000_optimizerG.pth) gives error on testing

Instead of using pre-trained models, I trained the KAIR code and used the generated model for testing.

The KAIR training code produced 3 models:
40000_optimizerG.pth
40000_G.pth
40000_E.pth

Using these models, for testing the code in this repository, I am getting error:

(pytorch-gpu) C:\Users\Downloads\SwinIR-main>python main_test_swinir.py --task classical_sr --scale 2 --training_patch_size 48 --folder_lq testsets/Set5/LR_bicubic/X2 --folder_gt testsets/Set5/HR
loading model from model_zoo/swinir/40000_optimizerG.pth
Traceback (most recent call last):
  File "C:\Users\Downloads\SwinIR-main\main_test_swinir.py", line 253, in <module>
    main()
  File "C:\Users\Downloads\SwinIR-main\main_test_swinir.py", line 42, in main
    model = define_model(args)
  File "C:\Users\Downloads\SwinIR-main\main_test_swinir.py", line 174, in define_model
    model.load_state_dict(pretrained_model[param_key_g] if param_key_g in pretrained_model.keys() else pretrained_model, strict=True)
  File "C:\Users\anaconda3\envs\pytorch-gpu\lib\site-packages\torch\nn\modules\module.py", line 1406, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for SwinIR:
        Missing key(s) in state_dict: "conv_first.weight", "conv_first.bias", "patch_embed.norm.weight", "patch_embed.norm.bias", "layers.0.residual_group.blocks.0.norm1.weight", "layers.0.residual_group.blocks.0.norm1.bias", "layers.0.residual_group.blocks.0.attn.relative_position_bias_table",

40000_G.pth
40000_E.pth
are testing fine

Cuda out of memory for photos larger than 640x480px on RTX 3060 12GB

Thanks for great code released for free to public. Tried real world 4x large model. Results are great. Even better than known commercial products.
Real esrgan for example has --tile option for cuda out of memory errors. With your code I cant upscale larger than vga resolution photos with RTX 3060 12GB. Please tell me is there way for tiling or do I need to change img_size, for what values?
Thank you very much in advance!

Looking forward to your training code

Training time of SwinIR; Impact of learning rate (fix the lr to 1e-5 for x4 fine-tuning is slightly better)

In my opinion, the transformer costs much memory. And the paper pointed out that although swinir has fewer parameters, its speed is much slower than RCAN. So I'm curious about the cost of training, thank you.

关于PSNR和SSIM没有收敛到原论文中的性能

作者你好，

谢谢你所做的非常不错的工作，我阅读了SwinIR论文，并且star了此仓库。在我使用你们提供的预训练模型在Set5数据集上测试和在DIV2K和Flickr2K数据集训练Class Imager(x2)时，发现PSNR值没有达到原论文中的值，在这里请教下是否因为我的超参数设置的问题还是有些训练的trick。

使用官方的预训练模型(001_classicalSR_DF2K_s64w8_SwinIR-M_x2.pth),在Set5上测试：

Average PSNR: 36.21 dB;

使用https://github.com/cszn/KAIR 中提供的训练代码在DIV2K和Flickr2K数据集训练Class Imager(x2) 时：

Average PSNR 收敛到36.15dB，没达到论文中的性能。

谢谢！

It sounds unfair to use different Training set like EDSR or ther methods.

Need apple to apple training set to show the much better method or not better method .

about test

在swinIR模型中，有img_size这个参数，例如为128，在SwinLayer时，是input_resolution=（128， 128）, 比如我在测试的时候，我的输入图像不是(128, 128) 那么计算attention的时候有一个判断， if self.input_resolution == x_size， else attn_windows = self.attn(x_windows, mask=self.calculate_mask(x_size).to(x.device))。我想请问一下如果图片大小不等于self.input_resolution=(128,128)时，加入的参数 mask 这个是什么mask

IndexError: index 2080 is out of bounds for dimension 2 with size 2080

Hey, thanks for this awsome code.
It always worked great for me, but now I'm getting this error regardless of which image I'm trying the colab on.
In 3. Interference:

/content/Real-ESRGAN/BSRGAN
LogHandlers setup!
21-10-08 17:53:58.872 : Model Name : BSRGAN
21-10-08 17:53:58.873 : GPU ID : 0
[3, 3, 64, 23, 32, 4]
21-10-08 17:54:01.995 : Input Path : testsets/RealSRSet
21-10-08 17:54:01.995 : Output Path : testsets/RealSRSet_results_x4
21-10-08 17:54:01.996 : ---1 --> BSRGAN --> x4--> adsads.png
/content/Real-ESRGAN
Testing 0 adsads
loading model from experiments/pretrained_models/003_realSR_BSRGAN_DFO_s64w8_SwinIR-M_x4_GAN.pth
Traceback (most recent call last):
File "SwinIR/main_test_swinir.py", line 287, in
main()
File "SwinIR/main_test_swinir.py", line 73, in main
output = test(img_lq, model, args, window_size)
File "SwinIR/main_test_swinir.py", line 259, in test
output = model(img_lq)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(input, **kwargs)
File "/content/Real-ESRGAN/SwinIR/models/network_swinir.py", line 839, in forward
return x[:, :, Hself.upscale, Wself.upscale]
IndexError: index 2080 is out of bounds for dimension 2 with size 2080
loading model from experiments/pretrained_models/003_realSR_BSRGAN_DFOWMFC_s64w8_SwinIR-L_x4_GAN.pth
Traceback (most recent call last):
File "SwinIR/main_test_swinir.py", line 287, in
main()
File "SwinIR/main_test_swinir.py", line 73, in main
output = test(img_lq, model, args, window_size)
File "SwinIR/main_test_swinir.py", line 259, in test
output = model(img_lq)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(input, **kwargs)
File "/content/Real-ESRGAN/SwinIR/models/network_swinir.py", line 839, in forward
return x[:, :, Hself.upscale, Wself.upscale]
IndexError: index 2080 is out of bounds for dimension 2 with size 2080

Comparison with IPT

Hi,

Thanks for sharing this interesting work. Table. 6, CBSD68, sigma=50 shows that IPT achieves 28.39 PSNR. However, the original paper of IPT shows that it can achieves 29.88 (in their Table. 2). Is there any difference with these two settings?

runtimeerror when using other dataset?

Hi.
I want to train the model with my own dataset.
However, it keeps reporting
RuntimeError: stack expects each tensor to be equal size, but got [3, 256, 256] at entry 0 and [3, 256, 252] at entry 1
Do I have any wrong setting?
Thanks.

The part of json:
"datasets": {
"train": {
"name": "train_dataset" // just name
, "dataset_type": "sr" // "dncnn" | "dnpatch" | "fdncnn" | "ffdnet" | "sr" | "srmd" | "dpsr" | "plain" | "plainpatch" | "jpeg"
, "dataroot_H": "HR" // path of H training dataset. DIV2K (800 training images)
, "dataroot_L": "LR" // path of L training dataset

  , "H_size": 256                   // 96/144|192/384 | 128/192/256/512. LR patch size is set to 48 or 64 when compared with RCAN or RRDB.

  , "dataloader_shuffle": true
  , "dataloader_num_workers": 16
  , "dataloader_batch_size": 8      // batch size 1 | 16 | 32 | 48 | 64 | 128. Total batch size =4x8=32 in SwinIR
}
, "test": {
  "name": "test_dataset"            // just name
  , "dataset_type": "sr"         // "dncnn" | "dnpatch" | "fdncnn" | "ffdnet" | "sr" | "srmd" | "dpsr" | "plain" | "plainpatch" | "jpeg"
  , "dataroot_H": "testsets/Set5/HR"  // path of H testing dataset
  , "dataroot_L": "testsets/Set5/LR_bicubic/X4"              // path of L testing dataset

}

}

, "netG": {
"net_type": "swinir"
, "upscale": 4 // 2 | 3 | 4 | 8
, "in_chans": 3
, "img_size": 64 // For fair comparison, LR patch size is set to 48 or 64 when compared with RCAN or RRDB.
, "window_size": 8
, "img_range": 1.0
, "depths": [6, 6, 6, 6, 6, 6]
, "embed_dim": 180
, "num_heads": [6, 6, 6, 6, 6, 6]
, "mlp_ratio": 2
, "upsampler": "pixelshuffle" // "pixelshuffle" | "pixelshuffledirect" | "nearest+conv" | null
, "resi_connection": "1conv" // "1conv" | "3conv"

, "init_type": "default"

}

Training efficiency

Thanks for the great work again.

When training SWIR with the KAIR toolbox, I found that the CPU utilization was particularly high, while the GPU was always idle. And the training is particularly inefficient. I wonder if the author would be so kind as to tell me the GPU and CPU configurations used, and the training time?

patch_size

I found that the patch_size of the network setting uses the initial 1, then the pixel will become a token. What is the reason for not using the image block (e.g, 4*4) as the token?

About the FLOPs of SwinIR

Hi,

Thanks for sharing the code of this interesting work. Would you mind helping provide the FLOPs cost of SwinIR? E.g., FLOPs under 256x256x3 images. Thanks!

Looking forward to the open training part of the code

Training question

Thanks for your amazing work! I have a question regarding your training:
How many gpus did you use to train parallel? How many hours do it need to early stop?

testing dataset downsampled image

Thankx for releasing the wonderful code and data-sets.
I am encountering one problem while testing: While Set5 and Set14 have x2, x3 and x4 down-sampled images, other data-sets viz. Urban100, magna109 and BSDS100 do not. Will it be possible for you to share the down-sampled images for these datasets. I can down-sample them but probably the result may differ on them, than what is mentioned in the paper.

GPU numbers

dear author:
I want to know the numbers of gpu you used when train swinIR network.
thank you

Training settings for SwinIR light

Thanks for sharing your work.

I notice that the training config file for lightweight in KAIR may be not consistent with the statement in the paper. Could you double check that?
In particular, both the batch size and patch size are set to 64, and embed_dim is 180. Is this the correct setting?

About test part in training

Thanks for your code first!
I run the super-resolution lightweight part of the code, and there is an error in the testing part of training:

Traceback (most recent call last): File "main_train_psnr.py", line 291, in <module> main() File "main_train_psnr.py", line 190, in main current_psnr = util.calculate_psnr(E_img, H_img, border=border) File "/home/ET/huiyuxiang/KAIR/utils/utils_image.py", line 632, in calculate_psnr raise ValueError('Input images must have the same dimensions.') ValueError: Input images must have the same dimensions.

So that I print the shape and find it is padding the LR image to be multiple of 8 without HR image.

About training

Thank you for your work. I tried to train SwinIR but in my process of training swinir, I found that although the small swinir training is smooth, the loss often suddenly doubles when dim change to 180. Because of the memory problem, my batch_zie=16 lr=1e-4, may I have any special skills to let Is the training stable?

About drop path rate

I notice that you use the same parameters as swin transformer and set droppath 0.1, does the super resolution real nead the drop path？

Problem about testing my trained model

Thanks for the training code.
I train a classic model.
I get 500000_optimizerG.pth 500000_G.pth 500000_E.pth.
May I know which pth I should run during testing?
When I run
python main_test_swinir.py --task classical_sr --scale 4 --training_patch_size 64 --model_path superresolution/swinir_sr_classical_patch64_x4_l1/models/500000_G.pth --folder_lq testsets/real3wx4/test_LR_crop
It seems cannot load the model
model.load_state_dict(torch.load(args.model_path)['params'], strict=True)
KeyError: 'params'
May I know how to solve this?
Thanks.

about the color enhancement task

Hi！
Thank you for sharing your codes! Can this be applied for color enhancement task？

Thanks

Does the Classical Image Super-Resolution have color enhancement function?

I trained the Classical Image Super-Resolution with other training sets.
When I tested the model, I discovered that the color has changed to be sharper.
May I know that is this Classical Image Super-Resolution has color enhancement?

The left-hand side is swinir result.

Supplements

hello, In this great paper, some details write in the supplement, I want to know where I can find the supp? thankyou

swin layer

In swin transformer, self attention module conclude two subnet, one is simple windows self attention, another is shifted windows.in that code, in normal first windows self attention, there is no attn_mask, second shifted windows have mask. but in your code, seemly every self-attention layers have the attn_mask. that means every swin layer dont have windows self-attn, instead by all shifted windows in every layer? thank you

About Resi-connection

Hi there, thanks for the amazing work!
In the section 'Impact of residual connection and convolution layer in RSTB' of the paper said that would add a 1x1 conv or 3x3 conv at the residual connection.
the result shows that 3x3 is better than 1x1, also the inverted-bottleneck 3x3.

back to the code itself.
when I first read the 'resi_connection' argument in SwinIR class, I thought that '1conv' means 1x1 conv and '3conv' means 3x3 conv
after a while reading more code, I realized that '1conv' actually means 'one 3x3 conv' and '3conv' means 'three 3x3 conv'.

# build the last conv layer in deep feature extraction
if resi_connection == '1conv':
    self.conv_after_body = nn.Conv2d(embed_dim, embed_dim, 3, 1, 1)
elif resi_connection == '3conv':
    # to save parameters and memory
    self.conv_after_body = nn.Sequential(nn.Conv2d(embed_dim, embed_dim // 4, 3, 1, 1),
                                         nn.LeakyReLU(negative_slope=0.2, inplace=True),
                                         nn.Conv2d(embed_dim // 4, embed_dim // 4, 1, 1, 0),
                                         nn.LeakyReLU(negative_slope=0.2, inplace=True),
                                         nn.Conv2d(embed_dim // 4, embed_dim, 3, 1, 1))

I think the way of naming could be a little bit confusing.
It could be better if it calls '1conv3' '3conv3' or something else.

just want to tell you the confusing part.
thanks.

[SUGGESTION] Optimized version for videos

Hi there, SwinIR is really cool !

Sice it has been "ported" to VapourSynth (thanks to @HolyWu) some interesting discussions - with tests too - about its effectiveness on videos started:

Seems that the main issue is the processing speed, btw someone argue that the algorithm is not (yet ?) optimized for videos...

About speed: @xinntao may help to implement an NCNN-Vulkan (as already done for Real-ESGRAN)...

About video optimizations: a collaboration with @ding3820 of MIMO-VRN project may help...

Hope that inspires.

The input size during test

Hi, Jingyun, nice work! I just wonder why SwinIR function needs to set the 'img_size'. It is somehow kind of inconvenient, especially for test, since we usually want to test on different sizes of images, right? Is there any particular reason for this? Since Swin Transformer does not need this because they use padding operations. Besides, are there any requirements of the input size, i.e., must be the multiple of a number, or something else? Thanks.

A question about the framework

Hi, @JingyunLiang

I appreciate your fabulous work but I have a question about the framework. Did you ever try the Unet-like framework or encoder-decoder one for the Deep Feature Extraction Block (the whole transformer block)? As your framework is all of the same RSTB blocks, I am wondering if the encoder-decoder idea is helpful for the performance gain?

Thank you very much.

FLOPs?

Hi,

Thanks for this great work! Could you provide the FLOPs/MACs of your SwinIR model?

About #Parameters in the model

Thanks for providing the code for SwinIR!

I calculated the #Params and #FLOPS for the lightweight SwinIR model using KAIR. However, I'm not able to replicate the numbers mentioned in table 3 of the paper.
For example, I get the #Params as 910.2K instead of 878K in the table. The same happens with #FLOPS too. Could you please guide me on how to reproduce the results? Thanks!

About training code

I have download the training code from https://github.com/cszn/KAIR but it generates the G and E models when i try to do a SR task. Do I download the wrong code?

Problem when saving the model

Hi thanks for the training code.
I have a problem when iteration meet 5000 to save the model.
File "/KAIR/models/network_swinir.py", line 254, in forward
x_windows = window_partition(shifted_x, self.window_size) # nW*B, window_size, window_size, C
File "/KAIR/models/network_swinir.py", line 42, in window_partition
x = x.view(B, H // window_size, window_size, W // window_size, window_size, C)
RuntimeError: shape '[1, 111, 8, 143, 8, 180]' is invalid for input of size 184459500
May I know how to fix it?
Thanks.

Can gan be finetuned on own dataset?

When I try to set pretrained models (003_realSR_BSRGAN_DFO_s64w8_SwinIR-M_x4_GAN.pth) paths in KAIR's train file;

, "path": {
"root": "superresolution" // "denoising" | "superresolution" | "dejpeg"
, "pretrained_netG": null // path of pretrained model
, "pretrained_netD": null // path of pretrained model
, "pretrained_netE": null // path of pretrained model
}

it starts to train from scratch anyway. And when I copy from model_zoo right into /superresolution/swinir_sr_realworld_x4_gan/models/ thats not working either.

JSONDecodeError when training swinir

Hi @JingyunLiang I use the training code main_train_psnr.py in KAIR and I only change the dataroot and other necessary stuff. The training command is python main_train_psnr.py --opt options/swinir/train_swinir_sr_classical.json. And my environment is CUDA10.1+Pytorch1.7.1+Python3.7. When training the swinir model, I got this error:

So I search it and change the json_path='options/train_msrresnet_psnr.json' to json_path="options/train_msrresnet_psnr.json" in main_train_psnr.py. As you can see, the line 34 is the json_path.

But I still got the error over and over again. Could you please provide some suggestions? Thanks a lot.

Residual connection resulting in bad result

Thanks for sharing your work. I tried to add the residual connection in RSTB block and STL layer，but get a bad result. The residual connection was added as figure (1) and (2).

My question is:

I add the residual connection between the RSTB and STL (Only added in RSTB or only in STL also tried,but the result was bad either.) The figure showed the result of have adaptive parameters residual connection only in RSTB( The red line,and the blue one is your paper original SwinIR network). Your paper have only one global residual connection in RSTB（just from input adding to RSTB output）but get awesome result. So I want to know have you ever tried like above method by adding more residual connection and also get the bad result.
If you have tried but get the Impressive results, could you tell me the way what have you done?
Thanks ~
:-D

Illustration in the README

For info, you have linked to the wrong image for your result with SwinIR-Large.

|Real-World Image (x4)|[BSRGAN, ICCV2021](https://github.com/cszn/BSRGAN)|[Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN)|SwinIR (ours)|SwinIR-Large (ours)|

|       :---       |     :---:        |        :-----:         |        :-----:         |        :-----:         | 

|<img width="200" src="figs/ETH_LR.png">|<img width="200" src="figs/ETH_BSRGAN.png">|<img width="200" src="figs/ETH_realESRGAN.jpg">|<img width="200" src="figs/ETH_SwinIR.png">|<img width="200" src="figs/ETH_realESRGAN.jpg">|<img width="200" src="figs/ETH_SwinIR-L.png">|

|<img width="200" src="figs/OST_009_crop_LR.png">|<img width="200" src="figs/OST_009_crop_BSRGAN.png">|<img width="200" src="figs/OST_009_crop_realESRGAN.png">|<img width="200" src="figs/OST_009_crop_SwinIR.png">|<img width="200" src="figs/OST_009_crop_SwinIR-L.png">|

That is because you have an extra column on the row of the building. Check the end of the row:

|<img width="200" src="figs/ETH_SwinIR.png">|<img width="200" src="figs/ETH_realESRGAN.jpg">|<img width="200" src="figs/ETH_SwinIR-L.png">|

about charbonnierloss

charbonnierloos have a extra parameter eps, in paper, eps is 1e-3, its true use is (1e-3) ^ 2, but in your code May be you dont take ^2 operations. I Dont know its inference is important?

Issues about the patch embedding.

Hi, thanks very much for sharing this wonderful work. According to the definition of PatchEmbed(nn.Module). It seems that the parameters such as patch_size and img_size are not used. It seems that performance improvements of SwinIRs are provided by these MAS and MLP layers. Of course, multiple skip connections in the RSTB and STL are also helpful. I am curious about why the SwinIRs do not form patches with multiple pixels. For example, the PatchEmbed method used in
《Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions》.

train code

when do you release the training code? i want it soon. thank you

one question

hi,this is great work. I want to use this network for single image deraining, and what parts of this code can I modify? Or do you have any good suggestions? thanks!

High CPU Usage

Thanks for sharing your work. I meet a problem that the CPU usage is too high. When i set the H_size > 64 (eg.96 or 128) , the CPU usage is about 500%. I want to know why and what your type of GPU is used in the experiment in your paper. And I wonder if this problem is caused by the weak computing power of GPU (My GPU is NVIDIA RTX 2080 Ti).
Thanks~

denosing training code

I haven’t found the training code for the denoising task in KAIR, hasn’t it been released yet?

Training dataset - patch creation

It would be really helpful if you could point out how to create the patches when the image size is less than 128 x 128 (the patch size mentioned in the training settings). Would we consider such images by zero padding or exclude those images as the ones present in BSD500 dataset of size 120 x 80?

Why SwinIR can be directly (not patch by patch) tested on images with arbitrary sizes?

In my knowledge, the input in transformer must be fixed resolution, in test time, often take patch overlap method to test image in transformer.in your code, I want to know how the method you take, and the idea. I saw that like any resolution can be feed in swinIR? how to do it?
Looking forward your reply, thanku.!

About use_checkpoint

The code for use checkpoint misses one parameter:

The original code in network_swinir.py line 399:

x = checkpoint.checkpoint(blk, x)

Should be:

x = checkpoint.checkpoint(blk, x, x_size)

network interpolation

When i'm trying denoising images, I need more noise level such as 20 and 35, i think a network interpolation function may produce approximate model

About ape

Thanks for your code first!
After reading your code, I want to kown why don't use ape(absolute position embedding) in this code, because i saw the option is False by default.
I also want to confirm that if I use the 128-size pic to train the model with ape, could I can change the image size when I evaluation the model. I thought the length of position embedding is related to the num of patches, and the num of patches is related to the image size.
Hope you can solve my problem!