xpixelgroup / hat Goto Github PK

CVPR2023 - Activating More Pixels in Image Super-Resolution Transformer Arxiv - HAT: Hybrid Attention Transformer for Image Restoration

License: Apache License 2.0

Python 100.00%

hat's People

Contributors

Stargazers

Watchers

Forkers

cv-ip sunpro108 sunpiece chenyuanxu xiaom233 coco2dslesson baifree chenxwh hjgp thu-kingmin ksyu0508 unacanal cepiloth ziippy jonychoi capsulers milkietigera drshaogang sijieliu518 lhh-pi siathaoyi jinglongdu ujjujdp hevincent ja3537 dabinishere sybahk dl-vit ip-superresolution zmic t-nakatani waroron mramzy25 dhwi96 ljy315 rosebbb 8198608 olivergrace lxxxdovo yueueshi baoyu2020 aminem89 sbrunoberenguel kimtaehyeong russell-izadi-bose goblincyanide ericlong423 bmd1905 moukamisama tvrco iawe-uon advpropsys xtk8532704 charonf nurasian hcao10 namecantbenull junjieliwhu husnejahan molecular-medicine zhengbi-yong darylfung96 happyliu-666 abdm357 3321zsc thanhvid foobar41 asaydam sidbanzal jliu-1 fangyuan12138 drcoolface pancakeconnaisseur a-biao96 sahilmodi-enphase xyz53 chenheshan n1rv4n4 datomi79 mornydew eltociear jmanhype faberman yzhang525 brugarolas rejuvyesh guspan-tanadi hurry-dut shiny0510 note-liu doraviv02 me-srs jiguotong kimbente ngoduythinh250601 loxia106 yanlin2001 uraxurax curtincomputing casm-ahb

hat's Issues

Training Error. Could you provide some guidance on how to fix this error?

FileNotFoundError: [Errno 2] No such file or directory: '/qfs/projects/mage/watk681/DIV2K_train_HR/DIV2K_train_HR/002116_s044.png'

However, DIV2K/DIV2K_train_HR/ uses 0001.png, 0002.png, ..., 0800.png.
Any guidance on how to generate the compatible meta_info_DF2Ksub_GT.txt to work with DIV2K/DIV2K_train_HR/ uses 0001.png, 0002.png, ..., 0800.png.?

Could you share the pre-training model at scale factor X8?

How to use the pretrained model using python?

How to use the pretrained model using python? I wonder if there is a script can increase a picture's resolution ?

I'm afraid that chaiNNer will destory my python environment. So, could you provide a python script ?

Question about "overlap_ratio"

As state in the paper, the window_size M_0 of OCAB = (1 + 2*overlap_ratio) * M

However, the code implementation appears to be in conflict.

HAT/hat/archs/hat_arch.py

Line 373 in 09e7973

self.overlap_win_size = int(window_size * overlap_ratio) + window_size

Results are larger but not improved?

Input:

Output:

Dataset used to train HAT_GAN_Real_SRx4?

Hi,
Thanks for your great works and sharing them with us.

I found HAT_GAN_Real_SRx4 works really well on my custom dataset.
But, x4 is a bit overkill so I would liked to train x2 rather than x4.
So I am just wondering what dataset is used for real, and how you collected them.
Also are there any suggestions for training in general?

Thanks!

About position encoding and attention mask

Hello,

Thanks for your great work!

What is the difference in implementing position encoding and attention mask in overlapped cross-window attention?
I mean that overlapped cross-window attention is different from the vanilla one, since the window size of Q and K are different, and I think using original RPE and attention mask does not make sense.

Could you please give me some hints? Thanks in advance.

meta_info_file

测试指标和原文有出入

非常感谢各位的工作，你们出彩的工作使得超分辨领域有了更新的进展！
我在使用本仓库提供的预训练模型HAT_SRx2.pth、HAT_SRx3.pth、HAT_SRx4.pth验证Set5数据集时，发现数据和原文有了部分出入，详情如下：

测试1
采用的数据集：Set5的HR均是原图的png格式，图像大小未使用mod（如baby.png为512x512），LR是HR直接降采样获得的（如baby.png降采样2倍为256x256）
测试结果：2/3/4倍的PSNR测试结果分别比文章中少0.03，0，0，SSIM指标分别比文章少0.0001，0.0001，0.0001
测试2
采用的数据集：Set5的HR均是原图mod12的png格式，（如baby.png为504x504），LR是HR直接降采样获得的（如baby.png降采样2倍为252x252）
测试结果：2/3/4倍的PSNR测试结果分别比文章中少0.07，0.03，0.03，SSIM指标分别比文章多0.0003，0.0002，0.0004

请问你们获得文中的结果是使用的何种方式制备的测试集？期待恢复！

Does HAT have tile mode?

Hello, congratulation for getting highest score in "paperwithcode" benchmark summary. But, I curious with HAT, so I installed it.

I inference an image

!python hat/test.py -opt options/test/HAT_SRx4_ImageNet-LR.yml

But, it was errror because CUDA out of memory. I'm sorry because my VRAM only 2 GB. Yeah, I will try with google colab, but it has limited time and GPU quota.

Can HAT has a feature like tiling the image? for example
!python hat/test.py -opt options/test/HAT_SRx4_ImageNet-LR.yml --tile 256

The PSNR problem of HAT_SRx4.pth

I use the provided HAT_SRx4.pth and Set5 datasets, and run the
python hat/test.py -opt ./options/test/HAT_SRx4.yml

The results are follows:

`
2023-02-15 10:15:58,541 INFO: Loading HAT model from ./experiments/pretrained_models/HAT_SRx4.pth, with param key: [params_ema].

2023-02-15 10:15:58,767 INFO: Model [HATModel] is created.

2023-02-15 10:15:58,767 INFO: Testing Set5...

2023-02-15 10:16:00,483 INFO: Validation Set5

     # psnr: 28.6280        Best: 28.6280 @ HAT_SRx4 iter

     # ssim: 0.8403 Best: 0.8403 @ HAT_SRx4 iter

The PSNR is much lower than the results reported in the paper.
So how to reproduce the inference accuracy with the pretrained models

how long have you spent when you train ImageNet ? and how many GPUS you used ? thank you very much !

IsADirectoryError: [Errno 21] Is a directory: '../datasets/DF2K/DF2K_HR_sub/'

2022-12-23 14:30:11,207 INFO: Use Exponential Moving Average with decay: 0.999
2022-12-23 14:30:20,212 INFO: Network [HAT] is created.
2022-12-23 14:30:20,331 INFO: Loss [L1Loss] is created.
2022-12-23 14:30:20,342 INFO: Model [HATModel] is created.
2022-12-23 14:30:28,265 INFO: Start training from epoch: 0, iter: 0
2022-12-23 14:31:45,639 INFO: [train..][epoch: 0, iter: 100, lr:(2.000e-04,)] [eta: 3 days, 21:18:13, time (data): 0.774 (0.080)] l_pix: 7.0438e-02
2022-12-23 14:32:54,092 INFO: [train..][epoch: 0, iter: 200, lr:(2.000e-04,)] [eta: 3 days, 22:09:22, time (data): 0.729 (0.041)] l_pix: 6.1880e-02
2022-12-23 14:34:01,717 INFO: [train..][epoch: 0, iter: 300, lr:(2.000e-04,)] [eta: 3 days, 22:02:51, time (data): 0.676 (0.001)] l_pix: 4.3285e-02
2022-12-23 14:35:10,241 INFO: [train..][epoch: 0, iter: 400, lr:(2.000e-04,)] [eta: 3 days, 22:17:41, time (data): 0.681 (0.001)] l_pix: 4.3658e-02
2022-12-23 14:36:19,291 INFO: [train..][epoch: 0, iter: 500, lr:(2.000e-04,)] [eta: 3 days, 22:34:53, time (data): 0.691 (0.001)] l_pix: 4.6613e-02
2022-12-23 14:37:27,054 INFO: [train..][epoch: 0, iter: 600, lr:(2.000e-04,)] [eta: 3 days, 22:28:09, time (data): 0.684 (0.001)] l_pix: 2.6919e-02
2022-12-23 14:38:35,155 INFO: [train..][epoch: 0, iter: 700, lr:(2.000e-04,)] [eta: 3 days, 22:27:02, time (data): 0.681 (0.001)] l_pix: 3.7104e-02
2022-12-23 14:39:42,829 INFO: [train..][epoch: 0, iter: 800, lr:(2.000e-04,)] [eta: 3 days, 22:21:28, time (data): 0.679 (0.001)] l_pix: 2.9578e-02
2022-12-23 14:40:52,560 INFO: [train..][epoch: 0, iter: 900, lr:(2.000e-04,)] [eta: 3 days, 22:35:53, time (data): 0.696 (0.001)] l_pix: 2.9916e-02
2022-12-23 14:42:00,809 INFO: [train..][epoch: 0, iter: 1,000, lr:(2.000e-04,)] [eta: 3 days, 22:34:53, time (data): 0.689 (0.001)] l_pix: 1.9958e-02
2022-12-23 14:43:11,217 INFO: [train..][epoch: 0, iter: 1,100, lr:(2.000e-04,)] [eta: 3 days, 22:50:09, time (data): 0.704 (0.001)] l_pix: 2.3746e-02
2022-12-23 14:44:22,437 INFO: [train..][epoch: 0, iter: 1,200, lr:(2.000e-04,)] [eta: 3 days, 23:08:18, time (data): 0.708 (0.001)] l_pix: 4.1201e-02
2022-12-23 14:45:35,395 INFO: [train..][epoch: 0, iter: 1,300, lr:(2.000e-04,)] [eta: 3 days, 23:34:35, time (data): 0.732 (0.001)] l_pix: 2.7439e-02
2022-12-23 14:46:48,474 INFO: [train..][epoch: 0, iter: 1,400, lr:(2.000e-04,)] [eta: 3 days, 23:57:40, time (data): 0.731 (0.001)] l_pix: 3.8303e-02
Traceback (most recent call last):
File "train.py", line 11, in
train_pipeline(root_path)
File "/root/miniconda3/lib/python3.8/site-packages/basicsr/train.py", line 197, in train_pipeline
train_data = prefetcher.next()
File "/root/miniconda3/lib/python3.8/site-packages/basicsr/data/prefetch_dataloader.py", line 76, in next
return next(self.loader)
File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 435, in next
data = self._next_data()
File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data
return self._process_data(data)
File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
data.reraise()
File "/root/miniconda3/lib/python3.8/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
IsADirectoryError: Caught IsADirectoryError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
data = fetcher.fetch(index)
File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/root/miniconda3/lib/python3.8/site-packages/basicsr/data/paired_image_dataset.py", line 75, in getitem
img_bytes = self.file_client.get(gt_path, 'gt')
File "/root/miniconda3/lib/python3.8/site-packages/basicsr/utils/file_client.py", line 164, in get
return self.client.get(filepath)
File "/root/miniconda3/lib/python3.8/site-packages/basicsr/utils/file_client.py", line 63, in get
with open(filepath, 'rb') as f:
IsADirectoryError: [Errno 21] Is a directory: '../datasets/DF2K/DF2K_HR_sub/'

Traceback (most recent call last):
File "/root/miniconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/miniconda3/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/root/miniconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 260, in
main()
File "/root/miniconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 255, in main
raise subprocess.CalledProcessError(returncode=process.returncode,
subprocess.CalledProcessError: Command '['/root/miniconda3/bin/python', '-u', 'train.py', '--local_rank=0', '-opt', '../options/train/train_HAT_SRx2_from_scratch.yml', '--launcher', 'pytorch']' returned non-zero exit status 1.

Anyone have Google Colab Version (use pretrained weight) demo example ?

Thank you very much

Grayscale dataset

I'd like to ask about your great work.

Is it possible to run it on a grayscale dataset? If so what should I change?
I changed a number of input channels but it is not working for me.

network structuresnetwork_g:

type: HAT
upscale: 3
in_chans: 1

I am looking forward to hearing back from you.
Thank you,

Config for Real_HAT_GAN_SRx4

Could you please provide config for Real_HAT_GAN_SRx4 checkpoint

out of memory on testing

I set the batch size to 2 during training and it works fine.
I am using the V100 16G GPU card just one.
However, when I tried to test, it resulted in "out of memory" CUDA errors

[
2022-09-21 08:54:33,454 INFO: Model [HATModel] is created.
2022-09-21 08:54:33,455 INFO: Testing open...
Traceback (most recent call last):
File "/mnt/disk2/HAT/hat/test.py", line 11, in
test_pipeline(root_path)
File "/home/ubuntu/venv/lib/python3.10/site-packages/basicsr/test.py", line 40, in test_pipeline
model.validation(test_loader, current_iter=opt['name'], tb_logger=None, save_img=opt['val']['save_img'])
File "/home/ubuntu/venv/lib/python3.10/site-packages/basicsr/models/base_model.py", line 48, in validation
self.nondist_validation(dataloader, current_iter, tb_logger, save_img)
File "/home/ubuntu/venv/lib/python3.10/site-packages/basicsr/models/sr_model.py", line 157, in nondist_validation
self.test()
File "/mnt/disk2/HAT/hat/models/hat_model.py", line 29, in test
self.output = self.net_g(img)
File "/home/ubuntu/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/disk2/HAT/hat/archs/hat_arch.py", line 978, in forward
x = self.conv_after_body(self.forward_features(x)) + x
File "/mnt/disk2/HAT/hat/archs/hat_arch.py", line 964, in forward_features
x = layer(x, x_size, params)
File "/home/ubuntu/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/disk2/HAT/hat/archs/hat_arch.py", line 619, in forward
return self.patch_embed(self.conv(self.patch_unembed(self.residual_group(x, x_size, params), x_size))) + x
File "/home/ubuntu/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/disk2/HAT/hat/archs/hat_arch.py", line 530, in forward
x = self.overlap_attn(x, x_size, params['rpi_oca'])
File "/home/ubuntu/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/disk2/HAT/hat/archs/hat_arch.py", line 425, in forward
attn = attn + relative_position_bias.unsqueeze(0)
RuntimeError: CUDA out of memory. Tried to allocate 3.38 GiB (GPU 0; 15.78 GiB total capacity; 6.44 GiB already allocated; 1.18 GiB free; 13.45 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
]

How can I solve it?

How to run inference?

I am trying to inference some of my images, but readme doesn't really tell how to run inference other than test.
Could you briefly go over how to run inference?

Also, it seems like BasicSR must be integrated into this repository. Do I just copy and paste all the files from the BasicSR repository into this repository?

Computational costs

We want to know how many parameters and GFLOPs the working model has. Thanks.

No HAT-L_SRx4_ImageNet-pretrain.pth on the Google drive

Hello, guys! I can't find the best model on your Google drive, are there any reasons for that or you just missed it?) Great work, by the way

Questions about the structure of the model

Hello author, thank you for your great work. I would like you a question about the model.
In your HAT model architecture, the HAB includes CAB, which I understand is derived from RCAB in RCAN, I would like to know why you did not use RCAB but CAB when designing the network, and have you done any ablation studies to explore how RCAB/CAB affects the model?
I would be grateful if you would replay！

Blocky output

The model seems to be outputting images with many large pixel-like squares. These are clearly visible in the enlarged image. The original image is 700x500 image.

"No object named 'HATModel' found in 'model' registry!"

版本号都是一样的，为什么还是报错呢

callbacks.py里面import backend和objectives等包报错是否是因为数据集没整理好？

import backend as K import objectives as obj from generators import BaseGenerator import serializations import metrics from abc import ABCMeta, abstractmethod import pickle import matplotlib.pyplot as plt import numpy as np import time from supports import to_list, format_data_list, print_or_logging import sys from inspect import isfunction

代码提示没有错误，一运行会出现类似ModuleNotFoundError: No module named 'objectives'的错误
pip在命令行安装包和库，有的可以安装，有的安装不上找不到
代码没提示错误，运行就会报错是否和数据集或者模型有关，而不是和pip install在命令里安装有关？

Training time issues

I would like to ask you how long did it take you to train x2SR from scratch using DF2K?

Rather than always going from 64 to up, can we process 640x480 images as is?

I want to test how well it does without having to shrink the image to 64 height.

Running on CPU for ONNX export

Hello and thank you for this great project!
My end goal is to export the models to ONNX, but first I'm trying to run this on Mac x86 CPU. On other ML projects I can usually set the device to CPU as Torch not compiled with CUDA enabled, but am unable to find something resembling that in this project. Can someone point me in the right direction for either of these?

Look Forward Train Log

Dear author, could you please share some .log files of the training models (e.g. HAT_SRx2.pth, HAT_SRx3.pth, HAT_SRx4.pth, etc.) I look forward to your reply!

New Super-Resolution Benchmarks

Hello,

MSU Graphics & Media Lab Video Group has recently launched two new Super-Resolution Benchmarks.

Video Upscalers Benchmark: Quality Enhancement determines the best upscaling methods for increasing video resolution and improving visual quality.
Super-Resolution for Video Compression benchmark aims to test Super-Resolution methods on compressed videos and select the best model for each video codec standard.

If you are interested in participating, you can add your algorithm following the submission steps:

We would be grateful for your feedback on our work!

How to get same result on multiple runs ?

Hi, thanks for your sharing and contribution!

I tried to reproduce same result about training loss on my custom dataset, but it didn't work.

So, I wonder if HAT can return the exactly same result about training loss.

Any help would be much appreciated, thanks.

My environments

windows 10
python : 3.7.13
pytorch : 1.12.1+cu113
torchvision : 0.13.1+cu113
cuda : 11.3
cudnn : 8.4.1
basicsr : both 1.3.4.9 and 1.4.2 (latest version)

Methods I tried

use_hflip = False
use_rot = False
use_shuffle = False
num_worker_per_gpu = 0

Replicate demo not working

It appears the replicate demo throws an error currently?

Can you tell me how to preprocess images in HAT?

Thank you for sharing your code.

It appears to use BGR images during preprocessing.

Can I know the process of preprocessing other than this? (Example. Divide by 255)

how to test sigle image and save result

I dont understand

Every image I've tried to upscale with HAT just seems to stretch the image to a larger, even more blurrier size. I've tried all sorts of sizes ranging from 64x64 to 1024x1024. It seems to just click and drag the image larger for me without actually enhancing anything.

Am I doing something wrong? I'd love to be able to use this project but right now its very confusing to me. :/

Questions related to pre-trained dataset ImageNet

Thank you for your outstanding work! I have some questions about the ImageNet dataset that I would like to ask you.

I find it a bit confusing that in the pre-training yml file there are only GT files about ImageNet and no LR files.
There are some images with small size (e.g. less than 256x256) inside the ImageNet dataset, how do you handle for these images?
By the way, can you give me an overview of the ImageNet dataset preparation?
Looking forward to your reply!

什么时候会公布训练代码呢

ImageNet pre-trained HAT-L models without any fine-tuning.

Hi, thank you for your great work! I want to test the performance of ImageNet pre-trained HAT-L models without fine-tuning on DF2K. When pre-training HAT-L on the ImageNet dataset, the training time is long (about 13 days using 8 V100 GPUs). Could you please release those ImageNet pre-trained HAT-L models without any fine-tuning? Thanks for replying!

Overlapping ratio difference between paper and implementation

Hi!

I'm currently studying your implementation and your publication and I was wondering : Is there an explanation for your Equ. 9 and precisely for the factor of 2 to the overlapping ratio ?

I'm just asking since in your implementation at hat_arch.py it seems to me that you only use (1 + $\gamma$) $\times$ M as formula (lines 373 and 899) and it might change your interpretation of this hyperparameter in your publication (small ones), but I also might have missed something and be wrong.

Otherwise, thanks for your amazing work!

How much memory needed to run inference?

I get gpu oom error when running test.py. I currently have 16G. This is not enough?

Why the loss does not converged?

I'm learning with DF2K and using 4 GPUs.

Also, I'm refer the "train_HAT_SRx4_finetune_from_ImageNet_pretrain.yml" file.

I just change the dataroot_gt, dataroot_lq for train and val.
also change the num_worker_per_gpu, batch_size_per_gpu like that

### num_worker_per_gpu: 6
### batch_size_per_gpu: 4
num_worker_per_gpu: 3
batch_size_per_gpu: 8

But, after 80000 iter.. The l_pix does not converged.
[
2022-09-19 20:15:50,914 INFO: [train..][epoch:738, iter: 79,000, lr:(1.000e-05,)] [eta: 6 days, 12:51:28, time (data): 3.029 (0.004)] l_pix: 2.1359e-02
2022-09-19 20:15:50,915 INFO: Saving models and training states.
2022-09-19 20:21:18,215 INFO: [train..][epoch:739, iter: 79,100, lr:(1.000e-05,)] [eta: 6 days, 12:45:51, time (data): 3.218 (0.341)] l_pix: 3.3533e-02
2022-09-19 20:26:43,030 INFO: [train..][epoch:740, iter: 79,200, lr:(1.000e-05,)] [eta: 6 days, 12:40:10, time (data): 2.068 (0.031)] l_pix: 1.9019e-02
2022-09-19 20:32:16,706 INFO: [train..][epoch:741, iter: 79,300, lr:(1.000e-05,)] [eta: 6 days, 12:34:47, time (data): 3.264 (0.337)] l_pix: 1.9070e-02
2022-09-19 20:37:43,115 INFO: [train..][epoch:742, iter: 79,400, lr:(1.000e-05,)] [eta: 6 days, 12:29:08, time (data): 3.401 (0.004)] l_pix: 1.7958e-02
2022-09-19 20:42:36,323 INFO: [train..][epoch:742, iter: 79,500, lr:(1.000e-05,)] [eta: 6 days, 12:22:19, time (data): 2.954 (0.020)] l_pix: 1.5392e-02
2022-09-19 20:48:30,628 INFO: [train..][epoch:743, iter: 79,600, lr:(1.000e-05,)] [eta: 6 days, 12:17:40, time (data): 3.378 (0.003)] l_pix: 2.8961e-02
2022-09-19 20:53:45,430 INFO: [train..][epoch:744, iter: 79,700, lr:(1.000e-05,)] [eta: 6 days, 12:11:37, time (data): 3.156 (0.225)] l_pix: 3.7259e-02
2022-09-19 20:59:13,519 INFO: [train..][epoch:745, iter: 79,800, lr:(1.000e-05,)] [eta: 6 days, 12:06:02, time (data): 3.902 (0.031)] l_pix: 2.7916e-02
2022-09-19 21:04:49,328 INFO: [train..][epoch:746, iter: 79,900, lr:(1.000e-05,)] [eta: 6 days, 12:00:44, time (data): 3.374 (0.410)] l_pix: 2.1746e-02
2022-09-19 21:10:27,211 INFO: [train..][epoch:747, iter: 80,000, lr:(1.000e-05,)] [eta: 6 days, 11:55:30, time (data): 3.748 (0.094)] l_pix: 2.1582e-02
2022-09-19 21:10:27,213 INFO: Saving models and training states.
2022-09-19 21:22:35,811 INFO: Validation open
# psnr: 20.3545 Best: 20.3660 @ 65000 iter
# ssim: 0.4768 Best: 0.4769 @ 65000 iter

2022-09-19 21:27:52,322 INFO: [train..][epoch:748, iter: 80,100, lr:(1.000e-05,)] [eta: 6 days, 12:15:17, time (data): 3.176 (0.366)] l_pix: 2.4691e-02
2022-09-19 21:33:13,818 INFO: [train..][epoch:749, iter: 80,200, lr:(1.000e-05,)] [eta: 6 days, 12:09:25, time (data): 3.303 (0.093)] l_pix: 2.2727e-02
2022-09-19 21:38:51,310 INFO: [train..][epoch:750, iter: 80,300, lr:(1.000e-05,)] [eta: 6 days, 12:04:08, time (data): 3.374 (0.419)] l_pix: 1.5810e-02
2022-09-19 21:44:40,636 INFO: [train..][epoch:751, iter: 80,400, lr:(1.000e-05,)] [eta: 6 days, 11:59:15, time (data): 3.433 (0.393)] l_pix: 1.9958e-02
2022-09-19 21:50:00,407 INFO: [train..][epoch:752, iter: 80,500, lr:(1.000e-05,)] [eta: 6 days, 11:53:20, time (data): 3.198 (0.192)] l_pix: 2.1157e-02
2022-09-19 21:55:30,407 INFO: [train..][epoch:753, iter: 80,600, lr:(1.000e-05,)] [eta: 6 days, 11:47:47, time (data): 3.248 (0.231)] l_pix: 2.8304e-02
2022-09-19 22:00:58,110 INFO: [train..][epoch:754, iter: 80,700, lr:(1.000e-05,)] [eta: 6 days, 11:42:09, time (data): 3.279 (0.391)] l_pix: 2.4832e-02
2022-09-19 22:06:35,306 INFO: [train..][epoch:755, iter: 80,800, lr:(1.000e-05,)] [eta: 6 days, 11:36:50, time (data): 3.326 (0.384)] l_pix: 2.9092e-02
2022-09-19 22:12:15,613 INFO: [train..][epoch:756, iter: 80,900, lr:(1.000e-05,)] [eta: 6 days, 11:31:38, time (data): 3.429 (0.409)] l_pix: 2.6695e-02
2022-09-19 22:17:41,607 INFO: [train..][epoch:757, iter: 81,000, lr:(1.000e-05,)] [eta: 6 days, 11:25:57, time (data): 3.343 (0.408)] l_pix: 3.1762e-02
2022-09-19 22:17:41,609 INFO: Saving models and training states.
]

Do you know why the loss does not converged??

attched file is my .yaml file

please advise to me.
train_HAT_SRx4_my_others_to_open.yml--.log

数据集

你好，请问可以提供完整数据集的下载路径 or 制作完整数据集具体方法，谢谢

How to combine training?

Hello, I wonder how to combine x2, x3 and x4 SR training, whether the model structure needs to be modified?

Looking forward to hearing from you!

Should I change the lr if I use 4 GPUs while retaining the same total batchsize, i.e., 8 for each GPU? Thanks

network output image code

Hello, I have questions about the network output image code. Where is the code of the final output image of the network ? I want to know the specific code operation of the network to save the image.Why is there only [ ' save _ img ' ] in the code, but no specific saved code ?

No object named 'HATModel' found in 'model' registry!"

Hello，when I train my custom dataset ，it will happens like：

$E4IAYU_8BB VW P{P{CC_6L$

No object named 'HATModel' found in 'model' registry!"

how can I slove it.

RuntimeError: CUDA error: an illegal memory access was encountered

I tested img with shape (1,3,678,1020) on 2x bicubic SR task when using tile mode, error occured:

Traceback (most recent call last):
File "/home/baifree/pycharm-community-2021.1.2/plugins/python-ce/helpers/pydev/pydevd.py", line 1483, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/baifree/pycharm-community-2021.1.2/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/baifree/codes/denoise_research/HAT-main/hat/test.py", line 11, in
test_pipeline(root_path)
File "/home/baifree/anaconda3/envs/torch1.8.2-py36/lib/python3.6/site-packages/basicsr/test.py", line 40, in test_pipeline
model.validation(test_loader, current_iter=opt['name'], tb_logger=None, save_img=opt['val']['save_img'])
File "/home/baifree/anaconda3/envs/torch1.8.2-py36/lib/python3.6/site-packages/basicsr/models/base_model.py", line 48, in validation
self.nondist_validation(dataloader, current_iter, tb_logger, save_img)
File "/home/baifree/codes/denoise_research/HAT-main/hat/models/hat_model.py", line 171, in nondist_validation
self.crop_merge(self.img)
File "/home/baifree/codes/denoise_research/HAT-main/hat/models/hat_model.py", line 126, in crop_merge
outputlist = [self.crop_merge(patch) for patch in inputlist]
File "/home/baifree/codes/denoise_research/HAT-main/hat/models/hat_model.py", line 126, in
outputlist = [self.crop_merge(patch) for patch in inputlist]
File "/home/baifree/codes/denoise_research/HAT-main/hat/models/hat_model.py", line 123, in crop_merge
output_batch = self.net_g(input_batch)
File "/home/baifree/anaconda3/envs/torch1.8.2-py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/baifree/codes/denoise_research/HAT-main/hat/archs/hat_arch.py", line 978, in forward
x = self.conv_after_body(self.forward_features(x)) + x
File "/home/baifree/codes/denoise_research/HAT-main/hat/archs/hat_arch.py", line 964, in forward_features
x = layer(x, x_size, params)
File "/home/baifree/anaconda3/envs/torch1.8.2-py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/baifree/codes/denoise_research/HAT-main/hat/archs/hat_arch.py", line 619, in forward
return self.patch_embed(self.conv(self.patch_unembed(self.residual_group(x, x_size, params), x_size))) + x
File "/home/baifree/anaconda3/envs/torch1.8.2-py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/baifree/codes/denoise_research/HAT-main/hat/archs/hat_arch.py", line 530, in forward
x = self.overlap_attn(x, x_size, params['rpi_oca'])
File "/home/baifree/anaconda3/envs/torch1.8.2-py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/baifree/codes/denoise_research/HAT-main/hat/archs/hat_arch.py", line 425, in forward
attn = attn + relative_position_bias.unsqueeze(0)
RuntimeError: CUDA error: an illegal memory access was encountered
python-BaseException

Process finished with exit code 1

Could you please give us some help.

my GT is divided separated with each domain. (load, bird, car, ... and so on)
So, If you can. please do that.