Code Monkey home page Code Monkey logo

jingyunliang / swinir Goto Github PK

View Code? Open in Web Editor NEW
4.1K 56.0 502.0 29.83 MB

SwinIR: Image Restoration Using Swin Transformer (official repository)

Home Page: https://arxiv.org/abs/2108.10257

License: Apache License 2.0

Python 97.87% Shell 2.13%
image-super-resolution image-denoising compression-artifact-reduction image-deblocking transformer real-world-image-super-resolution lightweight-image-super-resolution image-restoration low-level-vision vision-transformer

swinir's Introduction

Jingyun Liang visitorsGitHub Followers

Email / Homepage / Google Scholar / Github

I am currently a PhD Student at Computer Vision Lab, ETH Zürich, Switzerland. I am co-supervised by Prof. Luc Van Gool and Prof. Radu Timofte. I also work closely with Dr. Kai Zhang. I mainly focus on low-level vision research, especially on image and video restoration, such as

  • image/video super-resolution (SR)
  • image/video deblurring
  • image/video denoising
  • ...

🚀 News

  • 2022-10-04: Our new paper RVRT, NeurlPS2022 achieves SOTA video restoration results with balanced size, memory and runtime.
  • 2022-08-30: See our papers on real-world image denoising (SCUNet) and video denoising (ReViD).
  • 2022-07-30: Three papers, including EFNet (event-based image deblurring, oral), DATSR (reference image SR) and DAVSR (video SR), accepted by ECCV2022.
  • 2022-01-28: Our new paper VRT outperforms previous Video SR/ deblurring/ denoising/ frame interpolation/ space-time video SR methods by up to 😍 2.16dB. 😍
  • 2021-10-20: SwinIR is awarded the best paper prize in ICCV-AIM2021.
  • 2021-08-01: Three papers (HCFlow, MANet and BSRGAN) accepted by ICCV2021.
  • 2021-03-29: One paper (FKP) accepted by CVPR2021.

🌱 Repositories

Topic Title Badge
real-world video denoising Practical Real Video Denoising with Realistic Degradation Model arXivGitHub Stars
event-based image deblurring Event-based Fusion for Motion Deblurring with Cross-modal Attention, ECCV2022 arXivGitHub Stars
reference image SR Reference-based Image Super-Resolution with Deformable Attention Transformer, ECCV2022 arXivGitHub Stars
interpretable video restoration Towards Interpretable Video Super-Resolution via Alternating Optimization, ECCV2022 arXivGitHub Stars
transformer-based video restoration Recurrent Video Restoration Transformer with Guided Deformable Attention arXivGitHub Starsdownload google colab logo
transformer-based video restoration VRT: A Video Restoration Transformer arXivGitHub Starsdownload google colab logo
transformer-based image restoration SwinIR: Image Restoration Using Swin Transformer arXivGitHub Starsdownload google colab logo
real-world image denoising Practical Blind Denoising via Swin-Conv-UNet and Data Synthesis arXivGitHub Stars
real-world image SR Designing a Practical Degradation Model for Deep Blind Image Super-Resolution, ICCV2021 arXivGitHub Stars
blind image SR Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution, ICCV2021 arXivGitHub Starsdownload google colab logo
blind image SR Flow-based Kernel Prior with Application to Blind Super-Resolution, CVPR2021 arXivGitHub Stars
normalizing flow-based image SR and image rescaling Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling, ICCV2021 arXivGitHub Starsdownload google colab logo
image/ video restoration Image/ Video Restoration Toolbox GitHub StarsdownloadGitHub Forks

swinir's People

Contributors

ak391 avatar chenxwh avatar jingyunliang avatar liuyinglao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

swinir's Issues

Trained model from KAIR (40000_optimizerG.pth) gives error on testing

Instead of using pre-trained models, I trained the KAIR code and used the generated model for testing.

The KAIR training code produced 3 models:
40000_optimizerG.pth
40000_G.pth
40000_E.pth

Using these models, for testing the code in this repository, I am getting error:

(pytorch-gpu) C:\Users\Downloads\SwinIR-main>python main_test_swinir.py --task classical_sr --scale 2 --training_patch_size 48 --folder_lq testsets/Set5/LR_bicubic/X2 --folder_gt testsets/Set5/HR
loading model from model_zoo/swinir/40000_optimizerG.pth
Traceback (most recent call last):
  File "C:\Users\Downloads\SwinIR-main\main_test_swinir.py", line 253, in <module>
    main()
  File "C:\Users\Downloads\SwinIR-main\main_test_swinir.py", line 42, in main
    model = define_model(args)
  File "C:\Users\Downloads\SwinIR-main\main_test_swinir.py", line 174, in define_model
    model.load_state_dict(pretrained_model[param_key_g] if param_key_g in pretrained_model.keys() else pretrained_model, strict=True)
  File "C:\Users\anaconda3\envs\pytorch-gpu\lib\site-packages\torch\nn\modules\module.py", line 1406, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for SwinIR:
        Missing key(s) in state_dict: "conv_first.weight", "conv_first.bias", "patch_embed.norm.weight", "patch_embed.norm.bias", "layers.0.residual_group.blocks.0.norm1.weight", "layers.0.residual_group.blocks.0.norm1.bias", "layers.0.residual_group.blocks.0.attn.relative_position_bias_table",

40000_G.pth
40000_E.pth
are testing fine

Cuda out of memory for photos larger than 640x480px on RTX 3060 12GB

Thanks for great code released for free to public. Tried real world 4x large model. Results are great. Even better than known commercial products.
Real esrgan for example has --tile option for cuda out of memory errors. With your code I cant upscale larger than vga resolution photos with RTX 3060 12GB. Please tell me is there way for tiling or do I need to change img_size, for what values?
Thank you very much in advance!

关于PSNR和SSIM没有收敛到原论文中的性能

作者你好,

谢谢你所做的非常不错的工作,我阅读了SwinIR论文,并且star了此仓库。在我使用你们提供的预训练模型在Set5数据集上测试 和在DIV2K和Flickr2K数据集训练Class Imager(x2)时,发现PSNR值没有达到原论文中的值,在这里请教下是否因为我的超参数设置的问题还是有些训练的trick。

使用官方的预训练模型(001_classicalSR_DF2K_s64w8_SwinIR-M_x2.pth),在Set5上测试:
image

Average PSNR: 36.21 dB;

使用https://github.com/cszn/KAIR 中提供的训练代码 在DIV2K和Flickr2K数据集训练Class Imager(x2) 时:
image

Average PSNR 收敛到36.15dB,没达到论文中的性能。

谢谢!

about test

在swinIR模型中,有img_size这个参数,例如为128, 在SwinLayer时,是input_resolution=(128, 128), 比如我在测试的时候,我的输入图像不是(128, 128) 那么计算attention的时候 有一个判断, if self.input_resolution == x_size, else attn_windows = self.attn(x_windows, mask=self.calculate_mask(x_size).to(x.device))。我想请问一下如果图片大小不等于self.input_resolution=(128,128)时, 加入的参数 mask 这个是什么mask

IndexError: index 2080 is out of bounds for dimension 2 with size 2080

Hey, thanks for this awsome code.
It always worked great for me, but now I'm getting this error regardless of which image I'm trying the colab on.
In 3. Interference:

/content/Real-ESRGAN/BSRGAN
LogHandlers setup!
21-10-08 17:53:58.872 : Model Name : BSRGAN
21-10-08 17:53:58.873 : GPU ID : 0
[3, 3, 64, 23, 32, 4]
21-10-08 17:54:01.995 : Input Path : testsets/RealSRSet
21-10-08 17:54:01.995 : Output Path : testsets/RealSRSet_results_x4
21-10-08 17:54:01.996 : ---1 --> BSRGAN --> x4--> adsads.png
/content/Real-ESRGAN
Testing 0 adsads
loading model from experiments/pretrained_models/003_realSR_BSRGAN_DFO_s64w8_SwinIR-M_x4_GAN.pth
Traceback (most recent call last):
File "SwinIR/main_test_swinir.py", line 287, in
main()
File "SwinIR/main_test_swinir.py", line 73, in main
output = test(img_lq, model, args, window_size)
File "SwinIR/main_test_swinir.py", line 259, in test
output = model(img_lq)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(input, **kwargs)
File "/content/Real-ESRGAN/SwinIR/models/network_swinir.py", line 839, in forward
return x[:, :, H
self.upscale, Wself.upscale]
IndexError: index 2080 is out of bounds for dimension 2 with size 2080
loading model from experiments/pretrained_models/003_realSR_BSRGAN_DFOWMFC_s64w8_SwinIR-L_x4_GAN.pth
Traceback (most recent call last):
File "SwinIR/main_test_swinir.py", line 287, in
main()
File "SwinIR/main_test_swinir.py", line 73, in main
output = test(img_lq, model, args, window_size)
File "SwinIR/main_test_swinir.py", line 259, in test
output = model(img_lq)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(input, **kwargs)
File "/content/Real-ESRGAN/SwinIR/models/network_swinir.py", line 839, in forward
return x[:, :, H
self.upscale, W
self.upscale]
IndexError: index 2080 is out of bounds for dimension 2 with size 2080

Comparison with IPT

Hi,

Thanks for sharing this interesting work. Table. 6, CBSD68, sigma=50 shows that IPT achieves 28.39 PSNR. However, the original paper of IPT shows that it can achieves 29.88 (in their Table. 2). Is there any difference with these two settings?

runtimeerror when using other dataset?

Hi.
I want to train the model with my own dataset.
However, it keeps reporting
RuntimeError: stack expects each tensor to be equal size, but got [3, 256, 256] at entry 0 and [3, 256, 252] at entry 1
Do I have any wrong setting?
Thanks.

The part of json:
"datasets": {
"train": {
"name": "train_dataset" // just name
, "dataset_type": "sr" // "dncnn" | "dnpatch" | "fdncnn" | "ffdnet" | "sr" | "srmd" | "dpsr" | "plain" | "plainpatch" | "jpeg"
, "dataroot_H": "HR" // path of H training dataset. DIV2K (800 training images)
, "dataroot_L": "LR" // path of L training dataset

  , "H_size": 256                   // 96/144|192/384 | 128/192/256/512. LR patch size is set to 48 or 64 when compared with RCAN or RRDB.

  , "dataloader_shuffle": true
  , "dataloader_num_workers": 16
  , "dataloader_batch_size": 8      // batch size 1 | 16 | 32 | 48 | 64 | 128. Total batch size =4x8=32 in SwinIR
}
, "test": {
  "name": "test_dataset"            // just name
  , "dataset_type": "sr"         // "dncnn" | "dnpatch" | "fdncnn" | "ffdnet" | "sr" | "srmd" | "dpsr" | "plain" | "plainpatch" | "jpeg"
  , "dataroot_H": "testsets/Set5/HR"  // path of H testing dataset
  , "dataroot_L": "testsets/Set5/LR_bicubic/X4"              // path of L testing dataset

}

}

, "netG": {
"net_type": "swinir"
, "upscale": 4 // 2 | 3 | 4 | 8
, "in_chans": 3
, "img_size": 64 // For fair comparison, LR patch size is set to 48 or 64 when compared with RCAN or RRDB.
, "window_size": 8
, "img_range": 1.0
, "depths": [6, 6, 6, 6, 6, 6]
, "embed_dim": 180
, "num_heads": [6, 6, 6, 6, 6, 6]
, "mlp_ratio": 2
, "upsampler": "pixelshuffle" // "pixelshuffle" | "pixelshuffledirect" | "nearest+conv" | null
, "resi_connection": "1conv" // "1conv" | "3conv"

, "init_type": "default"

}

Training efficiency

Hi

Thanks for the great work again.

When training SWIR with the KAIR toolbox, I found that the CPU utilization was particularly high, while the GPU was always idle. And the training is particularly inefficient. I wonder if the author would be so kind as to tell me the GPU and CPU configurations used, and the training time?

patch_size

I found that the patch_size of the network setting uses the initial 1, then the pixel will become a token. What is the reason for not using the image block (e.g, 4*4) as the token?

About the FLOPs of SwinIR

Hi,

Thanks for sharing the code of this interesting work. Would you mind helping provide the FLOPs cost of SwinIR? E.g., FLOPs under 256x256x3 images. Thanks!

Training question

Thanks for your amazing work! I have a question regarding your training:
How many gpus did you use to train parallel? How many hours do it need to early stop?

testing dataset downsampled image

Thankx for releasing the wonderful code and data-sets.
I am encountering one problem while testing: While Set5 and Set14 have x2, x3 and x4 down-sampled images, other data-sets viz. Urban100, magna109 and BSDS100 do not. Will it be possible for you to share the down-sampled images for these datasets. I can down-sample them but probably the result may differ on them, than what is mentioned in the paper.

GPU numbers

dear author:
I want to know the numbers of gpu you used when train swinIR network.
thank you

Training settings for SwinIR light

Thanks for sharing your work.

I notice that the training config file for lightweight in KAIR may be not consistent with the statement in the paper. Could you double check that?
In particular, both the batch size and patch size are set to 64, and embed_dim is 180. Is this the correct setting?

About test part in training

Thanks for your code first!
I run the super-resolution lightweight part of the code, and there is an error in the testing part of training:

Traceback (most recent call last): File "main_train_psnr.py", line 291, in <module> main() File "main_train_psnr.py", line 190, in main current_psnr = util.calculate_psnr(E_img, H_img, border=border) File "/home/ET/huiyuxiang/KAIR/utils/utils_image.py", line 632, in calculate_psnr raise ValueError('Input images must have the same dimensions.') ValueError: Input images must have the same dimensions.

So that I print the shape and find it is padding the LR image to be multiple of 8 without HR image.

About training

Thank you for your work. I tried to train SwinIR but in my process of training swinir, I found that although the small swinir training is smooth, the loss often suddenly doubles when dim change to 180. Because of the memory problem, my batch_zie=16 lr=1e-4, may I have any special skills to let Is the training stable?

About drop path rate

I notice that you use the same parameters as swin transformer and set droppath 0.1, does the super resolution real nead the drop path?

Problem about testing my trained model

Thanks for the training code.
I train a classic model.
I get 500000_optimizerG.pth 500000_G.pth 500000_E.pth.
May I know which pth I should run during testing?
When I run
python main_test_swinir.py --task classical_sr --scale 4 --training_patch_size 64 --model_path superresolution/swinir_sr_classical_patch64_x4_l1/models/500000_G.pth --folder_lq testsets/real3wx4/test_LR_crop
It seems cannot load the model
model.load_state_dict(torch.load(args.model_path)['params'], strict=True)
KeyError: 'params'
May I know how to solve this?
Thanks.

Supplements

hello, In this great paper, some details write in the supplement, I want to know where I can find the supp? thankyou

swin layer

In swin transformer, self attention module conclude two subnet, one is simple windows self attention, another is shifted windows.in that code, in normal first windows self attention, there is no attn_mask, second shifted windows have mask. but in your code, seemly every self-attention layers have the attn_mask. that means every swin layer dont have windows self-attn, instead by all shifted windows in every layer? thank you

About Resi-connection

Hi there, thanks for the amazing work!
In the section 'Impact of residual connection and convolution layer in RSTB' of the paper said that would add a 1x1 conv or 3x3 conv at the residual connection.
the result shows that 3x3 is better than 1x1, also the inverted-bottleneck 3x3.

back to the code itself.
when I first read the 'resi_connection' argument in SwinIR class, I thought that '1conv' means 1x1 conv and '3conv' means 3x3 conv
after a while reading more code, I realized that '1conv' actually means 'one 3x3 conv' and '3conv' means 'three 3x3 conv'.

# build the last conv layer in deep feature extraction
if resi_connection == '1conv':
    self.conv_after_body = nn.Conv2d(embed_dim, embed_dim, 3, 1, 1)
elif resi_connection == '3conv':
    # to save parameters and memory
    self.conv_after_body = nn.Sequential(nn.Conv2d(embed_dim, embed_dim // 4, 3, 1, 1),
                                         nn.LeakyReLU(negative_slope=0.2, inplace=True),
                                         nn.Conv2d(embed_dim // 4, embed_dim // 4, 1, 1, 0),
                                         nn.LeakyReLU(negative_slope=0.2, inplace=True),
                                         nn.Conv2d(embed_dim // 4, embed_dim, 3, 1, 1))

I think the way of naming could be a little bit confusing.
It could be better if it calls '1conv3' '3conv3' or something else.

just want to tell you the confusing part.
thanks.

[SUGGESTION] Optimized version for videos

Hi there, SwinIR is really cool !

Sice it has been "ported" to VapourSynth (thanks to @HolyWu) some interesting discussions - with tests too - about its effectiveness on videos started:

Seems that the main issue is the processing speed, btw someone argue that the algorithm is not (yet ?) optimized for videos...

About speed: @xinntao may help to implement an NCNN-Vulkan (as already done for Real-ESGRAN)...

About video optimizations: a collaboration with @ding3820 of MIMO-VRN project may help...

Hope that inspires.

The input size during test

Hi, Jingyun, nice work! I just wonder why SwinIR function needs to set the 'img_size'. It is somehow kind of inconvenient, especially for test, since we usually want to test on different sizes of images, right? Is there any particular reason for this? Since Swin Transformer does not need this because they use padding operations. Besides, are there any requirements of the input size, i.e., must be the multiple of a number, or something else? Thanks.

A question about the framework

Hi, @JingyunLiang

I appreciate your fabulous work but I have a question about the framework. Did you ever try the Unet-like framework or encoder-decoder one for the Deep Feature Extraction Block (the whole transformer block)? As your framework is all of the same RSTB blocks, I am wondering if the encoder-decoder idea is helpful for the performance gain?

Thank you very much.

FLOPs?

Hi,

Thanks for this great work! Could you provide the FLOPs/MACs of your SwinIR model?

About #Parameters in the model

Thanks for providing the code for SwinIR!

I calculated the #Params and #FLOPS for the lightweight SwinIR model using KAIR. However, I'm not able to replicate the numbers mentioned in table 3 of the paper.
For example, I get the #Params as 910.2K instead of 878K in the table. The same happens with #FLOPS too. Could you please guide me on how to reproduce the results? Thanks!

Problem when saving the model

Hi thanks for the training code.
I have a problem when iteration meet 5000 to save the model.
File "/KAIR/models/network_swinir.py", line 254, in forward
x_windows = window_partition(shifted_x, self.window_size) # nW*B, window_size, window_size, C
File "/KAIR/models/network_swinir.py", line 42, in window_partition
x = x.view(B, H // window_size, window_size, W // window_size, window_size, C)
RuntimeError: shape '[1, 111, 8, 143, 8, 180]' is invalid for input of size 184459500
May I know how to fix it?
Thanks.

Can gan be finetuned on own dataset?

When I try to set pretrained models (003_realSR_BSRGAN_DFO_s64w8_SwinIR-M_x4_GAN.pth) paths in KAIR's train file;

, "path": {
"root": "superresolution" // "denoising" | "superresolution" | "dejpeg"
, "pretrained_netG": null // path of pretrained model
, "pretrained_netD": null // path of pretrained model
, "pretrained_netE": null // path of pretrained model
}

it starts to train from scratch anyway. And when I copy from model_zoo right into /superresolution/swinir_sr_realworld_x4_gan/models/ thats not working either.

JSONDecodeError when training swinir

Hi @JingyunLiang I use the training code main_train_psnr.py in KAIR and I only change the dataroot and other necessary stuff. The training command is python main_train_psnr.py --opt options/swinir/train_swinir_sr_classical.json. And my environment is CUDA10.1+Pytorch1.7.1+Python3.7. When training the swinir model, I got this error:
image

So I search it and change the json_path='options/train_msrresnet_psnr.json' to json_path="options/train_msrresnet_psnr.json" in main_train_psnr.py. As you can see, the line 34 is the json_path.
image

But I still got the error over and over again. Could you please provide some suggestions? Thanks a lot.

Residual connection resulting in bad result

Thanks for sharing your work. I tried to add the residual connection in RSTB block and STL layer,but get a bad result. The residual connection was added as figure (1) and (2).
5E9FE9741C9CC72A1564391DDD6161E1
My question is:

  • I add the residual connection between the RSTB and STL (Only added in RSTB or only in STL also tried,but the result was bad either.) The figure showed the result of have adaptive parameters residual connection only in RSTB( The red line,and the blue one is your paper original SwinIR network). Your paper have only one global residual connection in RSTB(just from input adding to RSTB output)but get awesome result. So I want to know have you ever tried like above method by adding more residual connection and also get the bad result.
    image

  • If you have tried but get the Impressive results, could you tell me the way what have you done?
    Thanks ~
    :-D

Illustration in the README

For info, you have linked to the wrong image for your result with SwinIR-Large.

Bug

Pic

|Real-World Image (x4)|[BSRGAN, ICCV2021](https://github.com/cszn/BSRGAN)|[Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN)|SwinIR (ours)|SwinIR-Large (ours)|

|       :---       |     :---:        |        :-----:         |        :-----:         |        :-----:         | 

|<img width="200" src="figs/ETH_LR.png">|<img width="200" src="figs/ETH_BSRGAN.png">|<img width="200" src="figs/ETH_realESRGAN.jpg">|<img width="200" src="figs/ETH_SwinIR.png">|<img width="200" src="figs/ETH_realESRGAN.jpg">|<img width="200" src="figs/ETH_SwinIR-L.png">|

|<img width="200" src="figs/OST_009_crop_LR.png">|<img width="200" src="figs/OST_009_crop_BSRGAN.png">|<img width="200" src="figs/OST_009_crop_realESRGAN.png">|<img width="200" src="figs/OST_009_crop_SwinIR.png">|<img width="200" src="figs/OST_009_crop_SwinIR-L.png">|

That is because you have an extra column on the row of the building. Check the end of the row:

|<img width="200" src="figs/ETH_SwinIR.png">|<img width="200" src="figs/ETH_realESRGAN.jpg">|<img width="200" src="figs/ETH_SwinIR-L.png">|

about charbonnierloss

charbonnierloos have a extra parameter eps, in paper, eps is 1e-3, its true use is (1e-3) ^ 2, but in your code May be you dont take ^2 operations. I Dont know its inference is important?

Issues about the patch embedding.

Hi, thanks very much for sharing this wonderful work. According to the definition of PatchEmbed(nn.Module). It seems that the parameters such as patch_size and img_size are not used. It seems that performance improvements of SwinIRs are provided by these MAS and MLP layers. Of course, multiple skip connections in the RSTB and STL are also helpful. I am curious about why the SwinIRs do not form patches with multiple pixels. For example, the PatchEmbed method used in
《Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions》.

train code

when do you release the training code? i want it soon. thank you

one question

hi,this is great work. I want to use this network for single image deraining, and what parts of this code can I modify? Or do you have any good suggestions? thanks!

High CPU Usage

Thanks for sharing your work. I meet a problem that the CPU usage is too high. When i set the H_size > 64 (eg.96 or 128) , the CPU usage is about 500%. I want to know why and what your type of GPU is used in the experiment in your paper. And I wonder if this problem is caused by the weak computing power of GPU (My GPU is NVIDIA RTX 2080 Ti).
Thanks~

denosing training code

I haven’t found the training code for the denoising task in KAIR, hasn’t it been released yet?

Training dataset - patch creation

It would be really helpful if you could point out how to create the patches when the image size is less than 128 x 128 (the patch size mentioned in the training settings). Would we consider such images by zero padding or exclude those images as the ones present in BSD500 dataset of size 120 x 80?

About use_checkpoint

The code for use checkpoint misses one parameter:

The original code in network_swinir.py line 399:

x = checkpoint.checkpoint(blk, x)

Should be:

x = checkpoint.checkpoint(blk, x, x_size)

network interpolation

When i'm trying denoising images, I need more noise level such as 20 and 35, i think a network interpolation function may produce approximate model

About ape

Thanks for your code first!
After reading your code, I want to kown why don't use ape(absolute position embedding) in this code, because i saw the option is False by default.
I also want to confirm that if I use the 128-size pic to train the model with ape, could I can change the image size when I evaluation the model. I thought the length of position embedding is related to the num of patches, and the num of patches is related to the image size.
Hope you can solve my problem!

About training code

Hi, there. Thanks for your amazing work, but I have some questions about the training code.

  1. Do we need to modify main_train_psnr.py (KAIR) to set training iterations to 500K? It's 1M epochs in the original file.

  2. I ran training python -m torch.distributed.launch --nproc_per_node=8 --master_port=1234 main_train_psnr.py --opt options/swinir/train_swinir_sr_classical.json --dist True on 8 RTX 3090 GPUs and the dataset is DIV2K train split (default X2). The estimated training time for 500K iters is ~3.5days (1min/100 iters), much longer than your 1.8 days on 8 2080 Ti GPUs. Do you have any idea about that?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.