maclory / spsr Goto Github PK

View Code? Open in Web Editor NEW

447.0 12.0 82.0 10.05 MB

Pytorch implementation of Structure-Preserving Super Resolution with Gradient Guidance (CVPR 2020 & TPAMI 2021)

Python 68.49% C 12.70% CSS 0.12% HTML 18.42% Shell 0.27%

structure-preserving super-resolution image-restoration computer-vision deep-learning

spsr's People

Contributors

Stargazers

Watchers

Forkers

killsking liuguoyou peterzs templeblock edward93lee scutlrr ashora shayanjoya shw2018 sitp2018 chisyliu darrenxc yichuan123 xrosliang zoq enicy-yang wyzhe woojunepark undercontroller reach2sbera fedral yangtong1989 yunfeiyoda nora919530829 hefengxiyulu scnulpc saket-m dokyeongk lotayou cv-ip pkuxmq qingfengmingyue blueamulet silent357 sjisoo sangiloh lijiansunny songyi1999 szwinglee csj11058 knxie sibangde caramel-jiao greitzmann whoo-jl zhangyingyue benblackcake gengjiaqi andingboy vincentriemer universewill howardfive senwang98 lizhenqi111 neooolee edmontdants oolongscu wittamer123 zeminli iampratz zhujundong1 yongsongh futureprecd fyushan yashvirani rococostudio cocowy1 yash-shimpi makex1n peterzhousz jinglongdu hirazafar1996 khushbooch m-afshari zhangjiewen123 ml-edu haorenkk123 lxxxdovo h-hui2277

spsr's Issues

你好！测试的话，inference需要经过梯度分支吗，时间成本（占比）如何？

question about test

您好，我想问一下，在测试的时候，你的代码里面pretrained_model_G填的是spsr.pth,我在训练完自己的数据集后，这个应该填哪个？并且训练完之后没有保存spsr.pth这个文件

resume training问题

Hi, 我看到resume training时候,
只load model_G, 最多会load model_D, 而model_D_grad不会load进来
这个是合理的吗?

Evaluation code modification suggestion

The evaluation code you provided is excellent.
I have a suggestion in SPSR/metrics/MetricEvaluation/utils/calc_scores.m line 26
where the original code is:
input_image_path = fullfile(input_dir,file_list(ii).name); input_image = convert_shave_image(imread(input_image_path),shave_width);
In fact, Many SR networks keep the size of SR and HR consistent by padding zero and clipping. Thus, The SR output size is not necessarily a multiple of scale.
I suggest making a modification:
input_image_path = fullfile(input_dir,file_list(ii).name); input_image = modcrop(imread(input_image_path), scale); input_image = convert_shave_image(input_image,shave_width);

RuntimeError: unexpected EOF, expected 37475777 more bytes. The file might be corrupted.

An unexpected break in the middle of the training indicates an error loading the model vgg19.
"20-05-12 14:36:42.857 - INFO: Loading pretrained model for G [/home/usst/hs/spsr/SPSR-master/experiments/pretrasin_models/RRDB_PSNR_x4.pth] ...
Traceback (most recent call last):
File "train.py", line 182, in
main()
File "train.py", line 84, in main
model = create_model(opt)
File "/home/usst/hs/spsr/SPSR-master/code/models/init.py", line 12, in create_model
m = M(opt)
File "/home/usst/hs/spsr/SPSR-master/code/models/SPSR_model.py", line 131, in init
self.netF = networks.define_F(opt, use_bn=False).to(self.device)
File "/home/usst/hs/spsr/SPSR-master/code/models/networks.py", line 170, in define_F
use_input_norm=True, device=device)
File "/home/usst/hs/spsr/SPSR-master/code/models/modules/architecture.py", line 551, in init
model = torchvision.models.vgg19(pretrained=True)
File "/home/usst/anaconda3/envs/torch_hs/lib/python3.6/site-packages/torchvision/models/vgg.py", line 172, in vgg19
return _vgg('vgg19', 'E', False, pretrained, progress, **kwargs)
File "/home/usst/anaconda3/envs/torch_hs/lib/python3.6/site-packages/torchvision/models/vgg.py", line 93, in _vgg
progress=progress)
File "/home/usst/anaconda3/envs/torch_hs/lib/python3.6/site-packages/torch/hub.py", line 509, in load_state_dict_from_url
return torch.load(cached_file, map_location=map_location)
File "/home/usst/anaconda3/envs/torch_hs/lib/python3.6/site-packages/torch/serialization.py", line 593, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/usst/anaconda3/envs/torch_hs/lib/python3.6/site-packages/torch/serialization.py", line 780, in _legacy_load
deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)
RuntimeError: unexpected EOF, expected 37475777 more bytes. The file might be corrupted."

There is no difference between LR and SR

There is no difference between LR and SR , when I test the model you provide which in BaiduCloud
Is there anything wrong?

Input:

Output:

Log detail:

export CUDA_VISIBLE_DEVICES=0
20-08-24 14:23:53.230 - INFO: name: SPSR
model: spsr
scale: 4
gpu_ids: [0]
datasets:[
test_1:[
name: set5
mode: LR
dataroot_LR: ../images/
phase: test
scale: 4
data_type: img
]
]
path:[
root: ./
pretrain_model_G: ../SPSR/spsr.pth
results_root: ./results/SPSR
log: ./results/SPSR
]
network_G:[
which_model_G: spsr_net
norm_type: None
mode: CNA
nf: 64
nb: 23
in_nc: 3
out_nc: 3
gc: 32
group: 1
scale: 4
]
is_train: False

20-08-24 14:23:53.231 - INFO: Dataset [LRDataset - set5] is created.
20-08-24 14:23:53.231 - INFO: Number of test images in [set5]: 1
20-08-24 14:24:01.398 - INFO: Loading pretrained model for G [../SPSR/spsr.pth] ...
20-08-24 14:24:02.428 - INFO: Model [SPSRModel] is created.
20-08-24 14:24:02.429 - INFO:
Testing [set5]...
20-08-24 14:24:03.315 - INFO: a

关于总的训练次数

您好，我在查看训练设置文件的时候发现总的迭代次数是500W次，这样训练一共需要大概4660多个epoch，请问总的迭代次数设定500W次是正确的吗？

Preprocess Datasets' Problem

I use DIV2K dataset as the training set.In the step of Generate Sub-Images,I can generate 32208 sub_images of HR images,but only genearate 12 sub_images of LR images.Later,I found out that HR images are about 20401404 pixels or 2040 1848 pixels,however the LR images are about 510351 pixels,510 462 pixels.So one HR image can generate about 40 HR sub_images,and one LR image can only generate 1 LR sub_image,even zero LR sub_image.So what should do to have the same number of HR sub_images and LR sub_images?Looking forward to your reply!

create model.py is not inside models folder

I've trained model on my own dataset,I meet some questions.

你好，我用你的模型在div2k上进行2倍的训练，效果很好，我想让模型的泛化能力更强，因此增加了我自己的数据集，结果出现红绿相间的像素点，（在黑白照上表现出现的是白块），请问你知道这是怎么回事吗？该如何解决？

RuntimeError: stack expects each tensor to be equal size, but got [3, 0, 0] at entry 0 and [3, 164, 0] at entry 1

你好，我按照readme使用了preprocess下的extract_subimage.py和create_lmdb.py在div2k数据集上获得了对应的lmdb的数据集，但是在运行train.py时，却报错了，错误如下：

``
20-11-0420:14:59.113 - INFO: Random seed: 9 20-11-04 20:14:59.116 - INFO: Read lmdb keys from cache: ../../dataset_div2k/DIV2K_lmdb/DIV2K_train_subHR.lmdb/_keys_cache.p 20-11-04 20:14:59.139 - INFO: Read lmdb keys from cache: ../../dataset_div2k/DIV2K_lmdb/DIV2K_train_subLR_bicubicX4.lmdb/_keys_cache.p 20-11-04 20:14:59.161 - INFO: Dataset [LRHRDataset - DIV2K] is created. 20-11-04 20:14:59.161 - INFO: Number of train images: 43,866, iters: 1,463 20-11-04 20:14:59.161 - INFO: Total epochs needed: 4 for iters 5,000 20-11-04 20:14:59.162 - INFO: Dataset [LRHRDataset - DIV2K] is created. 20-11-04 20:14:59.162 - INFO: Number of val images in [DIV2K]: 100 20-11-04 20:14:59.319 - INFO: Initialization method [kaiming] /root/anaconda3/envs/pytorch36/lib/python3.6/site-packages/torch/cuda/__init__.py:125: UserWarning: Tesla T4 with CUDA capability sm_75 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70. If you want to use the Tesla T4 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/ warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name)) 20-11-04 20:15:06.937 - INFO: Remove feature loss. 20-11-04 20:15:06.940 - WARNING: Params [module.get_g_nopadding.weight_h] will not optimize. 20-11-04 20:15:06.940 - WARNING: Params [module.get_g_nopadding.weight_v] will not optimize. 20-11-04 20:15:06.944 - INFO: Model [SPSRModel] is created. 20-11-04 20:15:06.944 - INFO: Start training from epoch: 0, iter: 0 Traceback (most recent call last): File "train.py", line 182, in <module> main() File "train.py", line 98, in main for _, train_data in enumerate(train_loader): File "/root/anaconda3/envs/pytorch36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 363, in __next__ data = self._next_data() File "/root/anaconda3/envs/pytorch36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 989, in _next_data return self._process_data(data) File "/root/anaconda3/envs/pytorch36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1014, in _process_data data.reraise() File "/root/anaconda3/envs/pytorch36/lib/python3.6/site-packages/torch/_utils.py", line 395, in reraise raise self.exc_type(msg) RuntimeError: Caught RuntimeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/root/anaconda3/envs/pytorch36/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop data = fetcher.fetch(index) File "/root/anaconda3/envs/pytorch36/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch return self.collate_fn(data) File "/root/anaconda3/envs/pytorch36/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 74, in default_collate return {key: default_collate([d[key] for d in batch]) for key in elem} File "/root/anaconda3/envs/pytorch36/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 74, in <dictcomp> return {key: default_collate([d[key] for d in batch]) for key in elem} File "/root/anaconda3/envs/pytorch36/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 55, in default_collate return torch.stack(batch, 0, out=out) RuntimeError: stack expects each tensor to be equal size, but got [3, 0, 0] at entry 0 and [3, 164, 0] at entry 1

Cannot find the test results

Hi
your work is amazing, I followed your introduction and the testing process is work.
However, I didn't find the ./result file. Is there something wrong?

Errors accur when using evaluation tool box

Hi, when I using the evaluation tool box to test one of the benchmark datasets, an error accurred:

Traceback (most recent call last):
File "evaluate_sr_results.py", line 99, in
LPIPS=CalLPIPS(j,k)
File "evaluate_sr_results.py", line 56, in CalLPIPS
dist = model.forward(imageA,imageB).detach().squeeze().numpy()
File "G:\dissertation\New Papers\SPSR\SPSR-master\metrics\LPIPS_init_.py", line 40, in forward
return self.model.forward(target, pred)
File "G:\dissertation\New Papers\SPSR\SPSR-master\metrics\LPIPS\dist_model.py", line 117, in forward
return self.net.forward(in0, in1, retPerLayer=retPerLayer)
File "G:\dissertation\New Papers\SPSR\SPSR-master\metrics\LPIPS\networks_basic.py", line 72, in forward
diffs[kk] = (feats0[kk]-feats1[kk])**2
RuntimeError: The size of tensor a (170) must match the size of tensor b (169) at non-singleton dimension 2

I follow the instructions provided, and other datasets seems to work well( I tested on Set5, B100, etc.), why?

CUDA Out Of Memory on 4900th iteration.

I have tried working with smaller batches even batch size of 1. but the error persists.
use_tb_logger: True
model: spsr
scale: 2
gpu_ids: [2, 3]
datasets:[
train:[
name: exp
mode: LRHR
dataroot_HR: /home/beemap/mobin_workspace/data/exp/train_HR.lmdb
dataroot_LR: /home/beemap/mobin_workspace/data/exp/train_LR.lmdb
subset_file: None
use_shuffle: True
n_workers: 16
batch_size: 50
HR_size: 128
use_flip: True
use_rot: True
phase: train
scale: 2
data_type: lmdb
]
val:[
name: exp
mode: LRHR
dataroot_HR: /home/beemap/mobin_workspace/data/exp/test_HR.lmdb
dataroot_LR: /home/beemap/mobin_workspace/data/exp/test_LR.lmdb
phase: val
scale: 2
data_type: lmdb
]
]
path:[
root: /home/beemap/mobin_workspace/code/SPSR
pretrain_model_G: /home/beemap/mobin_workspace/code/SPSR/experiments/pretrain_models/RRDB_PSNR_x4.pth
experiments_root: /home/beemap/mobin_workspace/code/SPSR/experiments/SPSR_LR_images_gen_4m_cycleGAN
models: /home/beemap/mobin_workspace/code/SPSR/experiments/SPSR_LR_images_gen_4m_cycleGAN/models
training_state: /home/beemap/mobin_workspace/code/SPSR/experiments/SPSR_LR_images_gen_4m_cycleGAN/training_state
log: /home/beemap/mobin_workspace/code/SPSR/experiments/SPSR_LR_images_gen_4m_cycleGAN
val_images: /home/beemap/mobin_workspace/code/SPSR/experiments/SPSR_LR_images_gen_4m_cycleGAN/val_images
]
network_G:[
which_model_G: spsr_net
norm_type: None
mode: CNA
nf: 64
nb: 23
in_nc: 3
out_nc: 3
gc: 32
group: 1
scale: 2
]
network_D:[
which_model_D: discriminator_vgg_128
norm_type: batch
act_type: leakyrelu
mode: CNA
nf: 64
in_nc: 3
]
train:[
lr_G: 0.0001
lr_G_grad: 0.0001
weight_decay_G: 0
weight_decay_G_grad: 0
beta1_G: 0.9
beta1_G_grad: 0.9
lr_D: 0.0001
weight_decay_D: 0
beta1_D: 0.9
lr_scheme: MultiStepLR
lr_steps: [50000, 100000, 200000, 300000]
lr_gamma: 0.5
pixel_criterion: l1
pixel_weight: 0.01
feature_criterion: l1
feature_weight: 1
gan_type: vanilla
gan_weight: 0.005
gradient_pixel_weight: 0.01
gradient_gan_weight: 0.005
pixel_branch_criterion: l1
pixel_branch_weight: 0.5
Branch_pretrain: 1
Branch_init_iters: 5000
manual_seed: 9
niter: 500000.0
val_freq: 50000000000.0
]
logger:[
print_freq: 100
save_checkpoint_freq: 4000.0
]
is_train: True

21-05-24 00:41:34.605 - INFO: Random seed: 9
21-05-24 00:41:34.608 - INFO: Read lmdb keys from cache: /home/beemap/mobin_workspace/data/exp/train_HR.lmdb/_keys_cache.p
21-05-24 00:41:34.611 - INFO: Read lmdb keys from cache: /home/beemap/mobin_workspace/data/exp/train_LR.lmdb/_keys_cache.p
21-05-24 00:41:34.614 - INFO: Dataset [LRHRDataset - exp] is created.
21-05-24 00:41:34.614 - INFO: Number of train images: 4,630, iters: 93
21-05-24 00:41:34.614 - INFO: Total epochs needed: 5377 for iters 500,000
21-05-24 00:41:34.615 - INFO: Read lmdb keys from cache: /home/beemap/mobin_workspace/data/exp/test_HR.lmdb/_keys_cache.p
21-05-24 00:41:34.615 - INFO: Read lmdb keys from cache: /home/beemap/mobin_workspace/data/exp/test_LR.lmdb/_keys_cache.p
21-05-24 00:41:34.615 - INFO: Dataset [LRHRDataset - exp] is created.
21-05-24 00:41:34.615 - INFO: Number of val images in [exp]: 50
21-05-24 00:41:35.034 - INFO: Initialization method [kaiming]
21-05-24 00:41:38.987 - INFO: Initialization method [kaiming]
21-05-24 00:41:39.614 - INFO: Initialization method [kaiming]
21-05-24 00:41:40.026 - INFO: Loading pretrained model for G [/home/beemap/mobin_workspace/code/SPSR/experiments/pretrain_models/RRDB_PSNR_x4.pth] ...
21-05-24 00:41:42.554 - WARNING: Params [module.get_g_nopadding.weight_h] will not optimize.
21-05-24 00:41:42.554 - WARNING: Params [module.get_g_nopadding.weight_v] will not optimize.
21-05-24 00:41:42.570 - INFO: Model [SPSRModel] is created.
21-05-24 00:41:42.570 - INFO: Start training from epoch: 0, iter: 0
export CUDA_VISIBLE_DEVICES=2,3
Path already exists. Rename it to [/home/beemap/mobin_workspace/code/SPSR/experiments/SPSR_LR_images_gen_4m_cycleGAN_archived_210524-004134]
/home/beemap/miniconda3/envs/pytorch-mobin/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:416: UserWarning: To get the last learning rate computed by the scheduler, please use get_last_lr().
warnings.warn("To get the last learning rate computed by the scheduler, "
21-05-24 00:44:53.517 - INFO: <epoch: 1, iter: 100, lr:1.000e-04> l_g_pix: 3.3209e-03 l_g_fea: 1.6455e+00 l_g_gan: 7.5400e-02 l_g_pix_grad_branch: 2.6782e-02 l_d_real: 6.6168e-05 l_d_fake: 1.1682e-06 l_d_real_grad: 1.0860e-03 l_d_fake_grad: 3.4428e-05 D_real: 1.7047e+01 D_fake: 1.9670e+00 D_real_grad: 1.2006e+01 D_fake_grad: -5.3090e-01
21-05-24 00:48:04.134 - INFO: <epoch: 2, iter: 200, lr:1.000e-04> l_g_pix: 3.6927e-03 l_g_fea: 1.6852e+00 l_g_gan: 7.7636e-02 l_g_pix_grad_branch: 2.8236e-02 l_d_real: 5.6790e-06 l_d_fake: 1.5473e-06 l_d_real_grad: 1.9656e-04 l_d_fake_grad: 2.0194e-06 D_real: 1.7042e+01 D_fake: 1.5145e+00 D_real_grad: 1.5466e+01 D_fake_grad: 1.3511e+00
21-05-24 00:51:16.203 - INFO: <epoch: 3, iter: 300, lr:1.000e-04> l_g_pix: 3.1596e-03 l_g_fea: 1.7432e+00 l_g_gan: 5.4458e-02 l_g_pix_grad_branch: 2.8226e-02 l_d_real: 3.5918e-04 l_d_fake: 3.7819e-04 l_d_real_grad: 2.3085e-04 l_d_fake_grad: 1.1712e-04 D_real: 1.7641e+01 D_fake: 6.7494e+00 D_real_grad: 1.0886e+01 D_fake_grad: -3.8302e-02
21-05-24 00:54:28.644 - INFO: <epoch: 4, iter: 400, lr:1.000e-04> l_g_pix: 2.9831e-03 l_g_fea: 1.7308e+00 l_g_gan: 5.7067e-02 l_g_pix_grad_branch: 2.8616e-02 l_d_real: 3.4906e-02 l_d_fake: 8.1219e-04 l_d_real_grad: 1.1727e-02 l_d_fake_grad: 1.6730e-02 D_real: 1.5657e+01 D_fake: 4.2613e+00 D_real_grad: 7.5860e+00 D_fake_grad: -4.1845e+00
21-05-24 00:57:41.556 - INFO: <epoch: 5, iter: 500, lr:1.000e-04> l_g_pix: 3.0206e-03 l_g_fea: 1.6828e+00 l_g_gan: 3.9894e-02 l_g_pix_grad_branch: 3.1332e-02 l_d_real: 3.7866e-02 l_d_fake: 4.4069e-02 l_d_real_grad: 5.7599e-04 l_d_fake_grad: 6.5754e-04 D_real: -1.7732e+00 D_fake: -9.7111e+00 D_real_grad: 9.5656e+00 D_fake_grad: -1.5118e+00
21-05-24 01:00:54.652 - INFO: <epoch: 6, iter: 600, lr:1.000e-04> l_g_pix: 3.1056e-03 l_g_fea: 1.5872e+00 l_g_gan: 7.1355e-02 l_g_pix_grad_branch: 2.9602e-02 l_d_real: 1.1663e-01 l_d_fake: 3.5371e-01 l_d_real_grad: 3.3140e-07 l_d_fake_grad: 8.8215e-08 D_real: -6.1734e+00 D_fake: -2.0209e+01 D_real_grad: 2.6732e+00 D_fake_grad: -1.6756e+01
21-05-24 01:04:06.999 - INFO: <epoch: 7, iter: 700, lr:1.000e-04> l_g_pix: 3.3884e-03 l_g_fea: 1.6590e+00 l_g_gan: 9.8999e-02 l_g_pix_grad_branch: 2.7133e-02 l_d_real: 6.8104e-05 l_d_fake: 1.0225e-02 l_d_real_grad: 0.0000e+00 l_d_fake_grad: 0.0000e+00 D_real: 4.4747e+00 D_fake: -1.5320e+01 D_real_grad: 7.7027e+00 D_fake_grad: -2.8511e+01
21-05-24 01:07:19.649 - INFO: <epoch: 8, iter: 800, lr:1.000e-04> l_g_pix: 3.0371e-03 l_g_fea: 1.6621e+00 l_g_gan: 7.6863e-02 l_g_pix_grad_branch: 2.7822e-02 l_d_real: 1.3455e-05 l_d_fake: 1.0611e-04 l_d_real_grad: 4.3578e-04 l_d_fake_grad: 1.6385e-04 D_real: -6.1604e+00 D_fake: -2.1533e+01 D_real_grad: 3.6745e+00 D_fake_grad: -8.4098e+00
21-05-24 01:10:31.936 - INFO: <epoch: 9, iter: 900, lr:1.000e-04> l_g_pix: 2.8769e-03 l_g_fea: 1.6759e+00 l_g_gan: 2.9822e-02 l_g_pix_grad_branch: 2.8341e-02 l_d_real: 2.8139e-02 l_d_fake: 1.0946e-01 l_d_real_grad: 5.4808e-05 l_d_fake_grad: 4.0746e-05 D_real: -7.9460e+00 D_fake: -1.3842e+01 D_real_grad: 1.0622e+00 D_fake_grad: -1.2964e+01
21-05-24 01:13:44.604 - INFO: <epoch: 10, iter: 1,000, lr:1.000e-04> l_g_pix: 2.7664e-03 l_g_fea: 1.6928e+00 l_g_gan: 9.8261e-02 l_g_pix_grad_branch: 2.8014e-02 l_d_real: 3.4927e-06 l_d_fake: 3.4571e-07 l_d_real_grad: 3.7050e-06 l_d_fake_grad: 2.1339e-05 D_real: -1.1567e+01 D_fake: -3.1219e+01 D_real_grad: 1.3630e+01 D_fake_grad: -1.1159e+00
21-05-24 01:16:58.029 - INFO: <epoch: 11, iter: 1,100, lr:1.000e-04> l_g_pix: 2.6696e-03 l_g_fea: 1.6866e+00 l_g_gan: 6.8830e-02 l_g_pix_grad_branch: 3.0263e-02 l_d_real: 1.3072e-05 l_d_fake: 1.4579e-04 l_d_real_grad: 8.3739e-05 l_d_fake_grad: 5.7104e-05 D_real: -9.3927e+00 D_fake: -2.3158e+01 D_real_grad: 9.3501e+00 D_fake_grad: -3.3562e+00
21-05-24 01:20:12.842 - INFO: <epoch: 13, iter: 1,200, lr:1.000e-04> l_g_pix: 3.1989e-03 l_g_fea: 1.6771e+00 l_g_gan: 7.3309e-02 l_g_pix_grad_branch: 2.6471e-02 l_d_real: 8.8113e-02 l_d_fake: 1.5901e-04 l_d_real_grad: 2.7180e-07 l_d_fake_grad: 1.5163e-06 D_real: -2.4804e+01 D_fake: -3.9421e+01 D_real_grad: 1.7869e+01 D_fake_grad: 9.5287e-01
21-05-24 01:23:25.288 - INFO: <epoch: 14, iter: 1,300, lr:1.000e-04> l_g_pix: 2.9128e-03 l_g_fea: 1.6844e+00 l_g_gan: 5.0193e-02 l_g_pix_grad_branch: 3.2575e-02 l_d_real: 6.6567e-04 l_d_fake: 8.2944e-04 l_d_real_grad: 8.7365e-05 l_d_fake_grad: 3.7020e-04 D_real: -6.1354e+00 D_fake: -1.6173e+01 D_real_grad: 1.2349e+01 D_fake_grad: -1.4154e+00
21-05-24 01:26:38.063 - INFO: <epoch: 15, iter: 1,400, lr:1.000e-04> l_g_pix: 2.4558e-03 l_g_fea: 1.5891e+00 l_g_gan: 6.2942e-02 l_g_pix_grad_branch: 3.1616e-02 l_d_real: 2.2350e-03 l_d_fake: 1.8222e-03 l_d_real_grad: 6.5287e-04 l_d_fake_grad: 5.0378e-04 D_real: -2.3414e+01 D_fake: -3.6000e+01 D_real_grad: 3.3537e+01 D_fake_grad: 2.4417e+01
21-05-24 01:29:50.940 - INFO: <epoch: 16, iter: 1,500, lr:1.000e-04> l_g_pix: 2.6007e-03 l_g_fea: 1.6478e+00 l_g_gan: 7.8554e-02 l_g_pix_grad_branch: 3.0603e-02 l_d_real: 2.0991e-04 l_d_fake: 4.3102e-04 l_d_real_grad: 2.3365e-07 l_d_fake_grad: 1.1206e-06 D_real: -2.1310e+01 D_fake: -3.7020e+01 D_real_grad: 2.1616e+01 D_fake_grad: 3.8412e+00
21-05-24 01:33:03.328 - INFO: <epoch: 17, iter: 1,600, lr:1.000e-04> l_g_pix: 2.1862e-03 l_g_fea: 1.6181e+00 l_g_gan: 1.0025e-01 l_g_pix_grad_branch: 3.0339e-02 l_d_real: 2.0098e-06 l_d_fake: 1.3280e-06 l_d_real_grad: 2.6226e-08 l_d_fake_grad: 1.1921e-07 D_real: -2.3161e+01 D_fake: -4.3211e+01 D_real_grad: 1.6464e+01 D_fake_grad: -5.1182e+00
21-05-24 01:36:16.004 - INFO: <epoch: 18, iter: 1,700, lr:1.000e-04> l_g_pix: 2.4907e-03 l_g_fea: 1.6881e+00 l_g_gan: 1.3110e-01 l_g_pix_grad_branch: 2.8322e-02 l_d_real: 8.5831e-08 l_d_fake: 2.7250e-06 l_d_real_grad: 3.7503e-06 l_d_fake_grad: 6.0510e-06 D_real: -2.4416e+01 D_fake: -5.0636e+01 D_real_grad: 3.0785e+01 D_fake_grad: 1.6611e+01
21-05-24 01:39:27.256 - INFO: <epoch: 19, iter: 1,800, lr:1.000e-04> l_g_pix: 2.0183e-03 l_g_fea: 1.6027e+00 l_g_gan: 1.1546e-01 l_g_pix_grad_branch: 2.6887e-02 l_d_real: 5.7220e-08 l_d_fake: 3.1208e-06 l_d_real_grad: 2.0038e-05 l_d_fake_grad: 8.9218e-05 D_real: -2.0398e+01 D_fake: -4.3490e+01 D_real_grad: 1.7696e+01 D_fake_grad: 3.8131e+00
21-05-24 01:42:39.856 - INFO: <epoch: 20, iter: 1,900, lr:1.000e-04> l_g_pix: 1.8941e-03 l_g_fea: 1.6969e+00 l_g_gan: 8.9008e-02 l_g_pix_grad_branch: 2.8792e-02 l_d_real: 1.1778e-06 l_d_fake: 1.0792e-04 l_d_real_grad: 1.1804e-04 l_d_fake_grad: 4.6953e-05 D_real: -1.0110e+01 D_fake: -2.7911e+01 D_real_grad: 1.0423e+01 D_fake_grad: -2.8702e+00
21-05-24 01:45:52.129 - INFO: <epoch: 21, iter: 2,000, lr:1.000e-04> l_g_pix: 2.4722e-03 l_g_fea: 1.6711e+00 l_g_gan: 2.9332e-02 l_g_pix_grad_branch: 3.0218e-02 l_d_real: 5.2074e-02 l_d_fake: 1.1575e-01 l_d_real_grad: 0.0000e+00 l_d_fake_grad: 0.0000e+00 D_real: -9.6310e+00 D_fake: -1.5414e+01 D_real_grad: -1.2766e+01 D_fake_grad: -3.8304e+01
21-05-24 01:49:03.386 - INFO: <epoch: 22, iter: 2,100, lr:1.000e-04> l_g_pix: 2.8986e-03 l_g_fea: 1.5935e+00 l_g_gan: 5.3678e-02 l_g_pix_grad_branch: 2.7128e-02 l_d_real: 1.8270e-04 l_d_fake: 2.4126e-04 l_d_real_grad: 0.0000e+00 l_d_fake_grad: 0.0000e+00 D_real: -6.1048e+00 D_fake: -1.6840e+01 D_real_grad: 1.0796e+00 D_fake_grad: -3.7817e+01
21-05-24 01:52:15.883 - INFO: <epoch: 23, iter: 2,200, lr:1.000e-04> l_g_pix: 2.4406e-03 l_g_fea: 1.6539e+00 l_g_gan: 1.1410e-01 l_g_pix_grad_branch: 2.8961e-02 l_d_real: 2.3842e-09 l_d_fake: 2.7536e-06 l_d_real_grad: 1.0729e-07 l_d_fake_grad: 3.3379e-08 D_real: -2.5580e+01 D_fake: -4.8400e+01 D_real_grad: -2.3847e+01 D_fake_grad: -4.4304e+01
21-05-24 01:55:28.255 - INFO: <epoch: 24, iter: 2,300, lr:1.000e-04> l_g_pix: 2.4852e-03 l_g_fea: 1.7700e+00 l_g_gan: 4.3522e-02 l_g_pix_grad_branch: 3.1261e-02 l_d_real: 4.3588e-03 l_d_fake: 6.2978e-04 l_d_real_grad: 2.8310e-04 l_d_fake_grad: 9.1119e-06 D_real: -1.4005e+01 D_fake: -2.2707e+01 D_real_grad: 4.6634e-01 D_fake_grad: -1.6876e+01
21-05-24 01:58:41.680 - INFO: <epoch: 26, iter: 2,400, lr:1.000e-04> l_g_pix: 3.0689e-03 l_g_fea: 1.7215e+00 l_g_gan: 7.7397e-02 l_g_pix_grad_branch: 2.8686e-02 l_d_real: 5.9508e-06 l_d_fake: 9.8943e-07 l_d_real_grad: 1.1660e-05 l_d_fake_grad: 1.3742e-04 D_real: -1.7526e+01 D_fake: -3.3005e+01 D_real_grad: -4.2123e+00 D_fake_grad: -2.1467e+01
21-05-24 02:01:54.331 - INFO: <epoch: 27, iter: 2,500, lr:1.000e-04> l_g_pix: 2.8539e-03 l_g_fea: 1.6631e+00 l_g_gan: 8.8362e-02 l_g_pix_grad_branch: 2.9986e-02 l_d_real: 1.8161e-04 l_d_fake: 1.3661e-06 l_d_real_grad: 0.0000e+00 l_d_fake_grad: 2.3842e-09 D_real: -2.6553e+00 D_fake: -2.0328e+01 D_real_grad: -6.0097e+00 D_fake_grad: -3.2225e+01
21-05-24 02:05:06.089 - INFO: <epoch: 28, iter: 2,600, lr:1.000e-04> l_g_pix: 2.6444e-03 l_g_fea: 1.8047e+00 l_g_gan: 1.1630e-01 l_g_pix_grad_branch: 2.9866e-02 l_d_real: 0.0000e+00 l_d_fake: 4.7684e-09 l_d_real_grad: 0.0000e+00 l_d_fake_grad: 0.0000e+00 D_real: -1.0456e+01 D_fake: -3.3716e+01 D_real_grad: 4.0606e+00 D_fake_grad: -3.6053e+01
21-05-24 02:08:18.046 - INFO: <epoch: 29, iter: 2,700, lr:1.000e-04> l_g_pix: 2.0163e-03 l_g_fea: 1.6041e+00 l_g_gan: 9.4380e-02 l_g_pix_grad_branch: 2.6837e-02 l_d_real: 6.4373e-08 l_d_fake: 2.6703e-07 l_d_real_grad: 1.5494e-05 l_d_fake_grad: 1.5154e-04 D_real: -5.2771e+00 D_fake: -2.4153e+01 D_real_grad: -2.8587e+01 D_fake_grad: -4.1664e+01
21-05-24 02:11:29.657 - INFO: <epoch: 30, iter: 2,800, lr:1.000e-04> l_g_pix: 3.2022e-03 l_g_fea: 1.7363e+00 l_g_gan: 6.1157e-02 l_g_pix_grad_branch: 2.9119e-02 l_d_real: 3.5096e-04 l_d_fake: 6.9597e-05 l_d_real_grad: 9.5367e-08 l_d_fake_grad: 5.2452e-08 D_real: -8.3919e+00 D_fake: -2.0623e+01 D_real_grad: -2.7735e+01 D_fake_grad: -4.7713e+01
21-05-24 02:14:41.966 - INFO: <epoch: 31, iter: 2,900, lr:1.000e-04> l_g_pix: 2.0775e-03 l_g_fea: 1.6452e+00 l_g_gan: 7.1554e-02 l_g_pix_grad_branch: 2.6934e-02 l_d_real: 2.3431e-04 l_d_fake: 1.8326e-03 l_d_real_grad: 2.6464e-07 l_d_fake_grad: 2.8610e-08 D_real: -1.5319e+01 D_fake: -2.9629e+01 D_real_grad: -1.5372e+01 D_fake_grad: -3.6825e+01
21-05-24 02:17:53.561 - INFO: <epoch: 32, iter: 3,000, lr:1.000e-04> l_g_pix: 1.7321e-03 l_g_fea: 1.6710e+00 l_g_gan: 9.1385e-02 l_g_pix_grad_branch: 2.9790e-02 l_d_real: 9.0599e-08 l_d_fake: 6.0320e-07 l_d_real_grad: 3.6304e-05 l_d_fake_grad: 6.1511e-07 D_real: -3.0952e+01 D_fake: -4.9229e+01 D_real_grad: -1.3276e+01 D_fake_grad: -3.5428e+01
21-05-24 02:21:04.942 - INFO: <epoch: 33, iter: 3,100, lr:1.000e-04> l_g_pix: 2.5524e-03 l_g_fea: 1.6122e+00 l_g_gan: 1.7510e-01 l_g_pix_grad_branch: 2.8861e-02 l_d_real: 0.0000e+00 l_d_fake: 4.1962e-07 l_d_real_grad: 2.5749e-07 l_d_fake_grad: 2.7180e-07 D_real: -1.9678e+01 D_fake: -5.4697e+01 D_real_grad: -1.0838e+01 D_fake_grad: -2.9239e+01
21-05-24 02:24:16.634 - INFO: <epoch: 34, iter: 3,200, lr:1.000e-04> l_g_pix: 2.9174e-03 l_g_fea: 1.6195e+00 l_g_gan: 8.8550e-02 l_g_pix_grad_branch: 3.1018e-02 l_d_real: 6.4611e-07 l_d_fake: 5.9604e-07 l_d_real_grad: 5.5455e-06 l_d_fake_grad: 6.1075e-05 D_real: -1.1526e+01 D_fake: -2.9236e+01 D_real_grad: -1.6291e+01 D_fake_grad: -3.2677e+01
21-05-24 02:27:28.812 - INFO: <epoch: 35, iter: 3,300, lr:1.000e-04> l_g_pix: 2.5499e-03 l_g_fea: 1.6712e+00 l_g_gan: 6.8183e-02 l_g_pix_grad_branch: 2.7392e-02 l_d_real: 9.3864e-02 l_d_fake: 2.1278e-04 l_d_real_grad: 3.6161e-04 l_d_fake_grad: 1.2811e-05 D_real: -2.7487e+01 D_fake: -4.1077e+01 D_real_grad: -1.0106e+01 D_fake_grad: -2.5549e+01
21-05-24 02:30:41.108 - INFO: <epoch: 36, iter: 3,400, lr:1.000e-04> l_g_pix: 2.3978e-03 l_g_fea: 1.6088e+00 l_g_gan: 1.3013e-01 l_g_pix_grad_branch: 2.9338e-02 l_d_real: 0.0000e+00 l_d_fake: 4.7684e-09 l_d_real_grad: 0.0000e+00 l_d_fake_grad: 0.0000e+00 D_real: -1.9569e+01 D_fake: -4.5596e+01 D_real_grad: -1.2644e+01 D_fake_grad: -3.9879e+01
21-05-24 02:33:54.622 - INFO: <epoch: 38, iter: 3,500, lr:1.000e-04> l_g_pix: 2.4048e-03 l_g_fea: 1.5419e+00 l_g_gan: 1.2125e-01 l_g_pix_grad_branch: 2.6149e-02 l_d_real: 4.6155e-06 l_d_fake: 6.2974e-03 l_d_real_grad: 0.0000e+00 l_d_fake_grad: 0.0000e+00 D_real: -8.3029e+00 D_fake: -3.2550e+01 D_real_grad: -9.6282e+00 D_fake_grad: -4.1604e+01
21-05-24 02:37:05.900 - INFO: <epoch: 39, iter: 3,600, lr:1.000e-04> l_g_pix: 2.2153e-03 l_g_fea: 1.6592e+00 l_g_gan: 1.5681e-01 l_g_pix_grad_branch: 2.8890e-02 l_d_real: 0.0000e+00 l_d_fake: 7.1526e-09 l_d_real_grad: 3.0068e-03 l_d_fake_grad: 9.6887e-03 D_real: -5.0906e+01 D_fake: -8.2267e+01 D_real_grad: -1.8265e+01 D_fake_grad: -2.6081e+01
21-05-24 02:40:17.427 - INFO: <epoch: 40, iter: 3,700, lr:1.000e-04> l_g_pix: 2.3257e-03 l_g_fea: 1.5420e+00 l_g_gan: 7.5334e-02 l_g_pix_grad_branch: 2.7359e-02 l_d_real: 6.2178e-06 l_d_fake: 1.1802e-06 l_d_real_grad: 1.2017e-03 l_d_fake_grad: 1.9033e-03 D_real: -2.3801e+01 D_fake: -3.8868e+01 D_real_grad: -8.9928e+00 D_fake_grad: -1.8048e+01
21-05-24 02:43:28.910 - INFO: <epoch: 41, iter: 3,800, lr:1.000e-04> l_g_pix: 2.6892e-03 l_g_fea: 1.6348e+00 l_g_gan: 9.0847e-02 l_g_pix_grad_branch: 2.8078e-02 l_d_real: 2.6703e-07 l_d_fake: 3.1226e-05 l_d_real_grad: 5.3070e-06 l_d_fake_grad: 1.0695e-05 D_real: -2.3891e+01 D_fake: -4.2061e+01 D_real_grad: -1.7052e+01 D_fake_grad: -3.1479e+01
21-05-24 02:46:40.285 - INFO: <epoch: 42, iter: 3,900, lr:1.000e-04> l_g_pix: 2.5676e-03 l_g_fea: 1.8278e+00 l_g_gan: 4.1917e-02 l_g_pix_grad_branch: 3.4700e-02 l_d_real: 1.5963e-02 l_d_fake: 4.6029e-02 l_d_real_grad: 9.9656e-06 l_d_fake_grad: 4.8351e-06 D_real: -4.0825e+01 D_fake: -4.9177e+01 D_real_grad: -9.2685e+00 D_fake_grad: -2.2622e+01
21-05-24 02:49:52.089 - INFO: <epoch: 43, iter: 4,000, lr:1.000e-04> l_g_pix: 2.2216e-03 l_g_fea: 1.5846e+00 l_g_gan: 6.4105e-02 l_g_pix_grad_branch: 2.6135e-02 l_d_real: 1.3576e-03 l_d_fake: 4.8046e-04 l_d_real_grad: 3.7264e-06 l_d_fake_grad: 8.0104e-06 D_real: -1.6566e+01 D_fake: -2.9386e+01 D_real_grad: -2.1560e+01 D_fake_grad: -3.7010e+01
21-05-24 02:49:52.089 - INFO: Saving models and training states.
21-05-24 02:53:05.239 - INFO: <epoch: 44, iter: 4,100, lr:1.000e-04> l_g_pix: 2.3361e-03 l_g_fea: 1.6421e+00 l_g_gan: 5.3571e-02 l_g_pix_grad_branch: 2.9098e-02 l_d_real: 1.9744e-04 l_d_fake: 2.6551e-03 l_d_real_grad: 9.2835e-02 l_d_fake_grad: 6.1299e-03 D_real: -2.1851e+01 D_fake: -3.2563e+01 D_real_grad: -4.4997e+00 D_fake_grad: -1.3830e+01
21-05-24 02:56:16.614 - INFO: <epoch: 45, iter: 4,200, lr:1.000e-04> l_g_pix: 1.7827e-03 l_g_fea: 1.5211e+00 l_g_gan: 8.9597e-02 l_g_pix_grad_branch: 2.7135e-02 l_d_real: 3.3855e-07 l_d_fake: 4.6569e-05 l_d_real_grad: 1.3113e-07 l_d_fake_grad: 1.3590e-07 D_real: -3.6528e+01 D_fake: -5.4447e+01 D_real_grad: -2.0493e+01 D_fake_grad: -3.8345e+01
21-05-24 02:59:29.413 - INFO: <epoch: 46, iter: 4,300, lr:1.000e-04> l_g_pix: 1.8728e-03 l_g_fea: 1.6894e+00 l_g_gan: 8.1638e-02 l_g_pix_grad_branch: 3.1682e-02 l_d_real: 3.8835e-05 l_d_fake: 4.4186e-05 l_d_real_grad: 8.2023e-05 l_d_fake_grad: 4.1961e-06 D_real: -3.1155e+01 D_fake: -4.7482e+01 D_real_grad: -2.8142e+01 D_fake_grad: -4.4408e+01
21-05-24 03:02:40.728 - INFO: <epoch: 47, iter: 4,400, lr:1.000e-04> l_g_pix: 1.9327e-03 l_g_fea: 1.6821e+00 l_g_gan: 6.4633e-02 l_g_pix_grad_branch: 2.9793e-02 l_d_real: 8.4029e-05 l_d_fake: 1.5388e-03 l_d_real_grad: 3.7105e-05 l_d_fake_grad: 2.8300e-06 D_real: -3.1148e+01 D_fake: -4.4074e+01 D_real_grad: -2.0028e+01 D_fake_grad: -3.8328e+01
21-05-24 03:05:52.137 - INFO: <epoch: 48, iter: 4,500, lr:1.000e-04> l_g_pix: 2.1025e-03 l_g_fea: 1.8216e+00 l_g_gan: 5.2623e-02 l_g_pix_grad_branch: 3.2644e-02 l_d_real: 9.1258e-02 l_d_fake: 6.3421e-04 l_d_real_grad: 1.4305e-08 l_d_fake_grad: 2.8610e-08 D_real: -3.1337e+01 D_fake: -4.1816e+01 D_real_grad: -1.3150e+01 D_fake_grad: -3.2264e+01
21-05-24 03:09:04.571 - INFO: <epoch: 49, iter: 4,600, lr:1.000e-04> l_g_pix: 1.7092e-03 l_g_fea: 1.6174e+00 l_g_gan: 4.5857e-02 l_g_pix_grad_branch: 2.7879e-02 l_d_real: 1.8214e-01 l_d_fake: 2.6645e-01 l_d_real_grad: 2.6844e-05 l_d_fake_grad: 2.5817e-05 D_real: -2.9393e+01 D_fake: -3.8340e+01 D_real_grad: -2.9426e+01 D_fake_grad: -4.1452e+01
21-05-24 03:12:18.200 - INFO: <epoch: 51, iter: 4,700, lr:1.000e-04> l_g_pix: 1.8148e-03 l_g_fea: 1.7709e+00 l_g_gan: 8.0268e-02 l_g_pix_grad_branch: 2.9699e-02 l_d_real: 2.2101e-06 l_d_fake: 2.5701e-06 l_d_real_grad: 2.6035e-06 l_d_fake_grad: 5.7112e-04 D_real: -3.2691e+01 D_fake: -4.8745e+01 D_real_grad: -2.9348e+01 D_fake_grad: -4.5035e+01
21-05-24 03:15:30.126 - INFO: <epoch: 52, iter: 4,800, lr:1.000e-04> l_g_pix: 2.1427e-03 l_g_fea: 1.6868e+00 l_g_gan: 7.9818e-02 l_g_pix_grad_branch: 2.9640e-02 l_d_real: 7.7724e-07 l_d_fake: 4.0957e-06 l_d_real_grad: 5.9761e-05 l_d_fake_grad: 1.7802e-02 D_real: -2.7988e+01 D_fake: -4.3951e+01 D_real_grad: -4.4252e+01 D_fake_grad: -5.5154e+01
21-05-24 03:18:42.705 - INFO: <epoch: 53, iter: 4,900, lr:1.000e-04> l_g_pix: 2.1152e-03 l_g_fea: 1.7703e+00 l_g_gan: 1.1302e-01 l_g_pix_grad_branch: 3.1607e-02 l_d_real: 2.3842e-09 l_d_fake: 0.0000e+00 l_d_real_grad: 8.5831e-08 l_d_fake_grad: 1.0014e-07 D_real: -2.3408e+01 D_fake: -4.6011e+01 D_real_grad: -2.8532e+01 D_fake_grad: -4.5819e+01
Traceback (most recent call last):
File "train.py", line 190, in
main()
File "train.py", line 106, in main
model.optimize_parameters(current_step)
File "/home/beemap/mobin_workspace/code/SPSR/code/models/SPSR_model.py", line 251, in optimize_parameters
self.fake_H_branch, self.fake_H, self.grad_LR = self.netG(self.var_L)
File "/home/beemap/miniconda3/envs/pytorch-mobin/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/beemap/miniconda3/envs/pytorch-mobin/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 161, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/beemap/miniconda3/envs/pytorch-mobin/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 171, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/beemap/miniconda3/envs/pytorch-mobin/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/home/beemap/miniconda3/envs/pytorch-mobin/lib/python3.8/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/beemap/miniconda3/envs/pytorch-mobin/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/home/beemap/miniconda3/envs/pytorch-mobin/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/beemap/mobin_workspace/code/SPSR/code/models/modules/architecture.py", line 147, in forward
x = block_listi+10
File "/home/beemap/miniconda3/envs/pytorch-mobin/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/beemap/mobin_workspace/code/SPSR/code/models/modules/block.py", line 229, in forward
out = self.RDB3(out)
File "/home/beemap/miniconda3/envs/pytorch-mobin/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/beemap/mobin_workspace/code/SPSR/code/models/modules/block.py", line 205, in forward
x4 = self.conv4(torch.cat((x, x1, x2, x3), 1))
RuntimeError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 11.91 GiB total capacity; 11.14 GiB already allocated; 36.25 MiB free; 11.25 GiB reserved in total by PyTorch)

Great work! Waiting for your code

blank image

I am training with my custom data. I have successfully trained on custom data and the validation result looks good.
However, when I run the exact same model on a test set, all images are black. I have checked pixel values, and only one channel among three has values between 0 and 20 even after the normalization.

Any clue what might have caused this issue?

if name == 'main':freeze_support() ..RuntimeError: DataLoader worker (pid(s) 11736) exited unexpectedly

Hi @Maclory
thank you for sharing your great work with us
when i try to run the code in windows 10 with cpu only ( no gpu ) it give me this error

G:\Sr\SPSR-master\code> python test.py -opt options/test/test_spsr.json
20-05-16 05:13:46.962 - INFO: name: SPSR
model: spsr
scale: 4
device: null
datasets:[
test_1:[
name: set5
mode: LR
dataroot_LR: ../Set5/Set5_LR
phase: test
scale: 4
data_type: img
]
]
path:[
root: ../release
pretrain_model_G: ../experiments/pretrain_models/spsr.pth
results_root: ../release\results\SPSR
log: ../release\results\SPSR
]
network_G:[
which_model_G: spsr_net
norm_type: None
mode: CNA
nf: 64
nb: 23
in_nc: 3
out_nc: 3
gc: 32
group: 1
scale: 4
]
is_train: False

20-05-16 05:13:46.979 - INFO: Dataset [LRDataset - set5] is created.
20-05-16 05:13:46.980 - INFO: Number of test images in [set5]: 1
20-05-16 05:13:47.784 - INFO: Loading pretrained model for G [../experiments/pretrain_models/spsr.pth] ...
20-05-16 05:13:50.471 - INFO: Model [SPSRModel] is created.
20-05-16 05:13:50.472 - INFO:
Testing [set5]...
20-05-16 05:13:51.150 - INFO: name: SPSR
model: spsr
scale: 4
device: null
datasets:[
test_1:[
name: set5
mode: LR
dataroot_LR: ../Set5/Set5_LR
phase: test
scale: 4
data_type: img
]
]
path:[
root: ../release
pretrain_model_G: ../experiments/pretrain_models/spsr.pth
results_root: ../release\results\SPSR
log: ../release\results\SPSR
]
network_G:[
which_model_G: spsr_net
norm_type: None
mode: CNA
nf: 64
nb: 23
in_nc: 3
out_nc: 3
gc: 32
group: 1
scale: 4
]
is_train: False

20-05-16 05:13:51.153 - INFO: Dataset [LRDataset - set5] is created.
20-05-16 05:13:51.154 - INFO: Number of test images in [set5]: 1
20-05-16 05:13:51.456 - INFO: Loading pretrained model for G [../experiments/pretrain_models/spsr.pth] ...
20-05-16 05:13:51.787 - INFO: Model [SPSRModel] is created.
20-05-16 05:13:51.787 - INFO:
Testing [set5]...
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\ABDO\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Users\ABDO\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "C:\Users\ABDO\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\ABDO\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
run_name="mp_main")
File "C:\Users\ABDO\AppData\Local\Programs\Python\Python36\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "C:\Users\ABDO\AppData\Local\Programs\Python\Python36\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "C:\Users\ABDO\AppData\Local\Programs\Python\Python36\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "G:\Sr\SPSR-master\code\test.py", line 43, in
for data in test_loader:
File "C:\Users\ABDO\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 193, in iter
return _DataLoaderIter(self)
File "C:\Users\ABDO\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 469, in init
w.start()
File "C:\Users\ABDO\AppData\Local\Programs\Python\Python36\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\Users\ABDO\AppData\Local\Programs\Python\Python36\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\ABDO\AppData\Local\Programs\Python\Python36\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\ABDO\AppData\Local\Programs\Python\Python36\lib\multiprocessing\popen_spawn_win32.py", line 33, in init
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\Users\ABDO\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
_check_not_importing_main()
File "C:\Users\ABDO\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Traceback (most recent call last):
File "C:\Users\ABDO\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 511, in _try_get_batch
data = self.data_queue.get(timeout=timeout)
File "C:\Users\ABDO\AppData\Local\Programs\Python\Python36\lib\multiprocessing\queues.py", line 105, in get
raise Empty
queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "test.py", line 43, in
for data in test_loader:
File "C:\Users\ABDO\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 576, in next
idx, batch = self._get_batch()
File "C:\Users\ABDO\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 553, in _get_batch
success, data = self._try_get_batch()
File "C:\Users\ABDO\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 519, in _try_get_batch
raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str))
RuntimeError: DataLoader worker (pid(s) 11736) exited unexpectedly

how can i fix this please

关于root中的path

您好，我想问问，在test_spsr.json和train_spsr.json中“path”中的root应该怎么填写呢？

Is it much slower than ESRGAN？

get a Error while run generate_dataset.py.

About HR_size and CUDA memory

Hello, do you know if I train another dataset (HR size = 480*480),which is different from your dataset, should I change the "HR_size":128 in tain_spsr.json?
If so, I'd change the 128 to 480, then I can't run the train code. It ends up with something like

23-06-17 14:54:34.882 - INFO: Start training from epoch: 0, iter: 0
Traceback (most recent call last):
File "train.py", line 182, in
main()
File "train.py", line 105, in main
model.optimize_parameters(current_step)
File "/home//SPSR-master/code/models/SPSR_model.py", line 282, in optimize_parameters
pred_g_fake = self.netD(self.fake_H)
File "/home//anaconda3/envs/RRSGAN37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(input, **kwargs)
File "/home//anaconda3/envs/RRSGAN37/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home//anaconda3/envs/RRSGAN37/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home//anaconda3/envs/RRSGAN37/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/home//anaconda3/envs/RRSGAN37/lib/python3.7/site-packages/torch/_utils.py", line 457, in reraise
raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home//anaconda3/envs/RRSGAN37/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(input, **kwargs)
File "/home//anaconda3/envs/RRSGAN37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(input, **kwargs)
File "/home//SPSR-master/code/models/modules/architecture.py", line 247, in forward
x = self.classifier(x)
File "/home//anaconda3/envs/RRSGAN37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(input, **kwargs)
File "/home//anaconda3/envs/RRSGAN37/lib/python3.7/site-packages/torch/nn/modules/container.py", line 141, in forward
input = module(input)
File "/home//anaconda3/envs/RRSGAN37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(input, **kwargs)
File "/home//anaconda3/envs/RRSGAN37/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 103, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x115200 and 8192x100)

To be noted that, within this error, '1*115200' is relevant to my batch size. When I use default batch size, it would turn out 'Out of CUDA Memory', so I had to set batch_size to 1.

If I don't change the HR_size (default = 128), then I can run the train code. But I don't know such training on my dataset (HR size = 480*480) would be appropriate or not?

I'm looking forward to your reply!
THANKS !

Cannot reproduce the reported performance using the pretrained checkpoint

Hi Maclory,

Thanks for your excellent work on SISR and for your generosity to share your code. However, I've met some problems in reproducing the reported results. I am using the pretrained weights in experiments/pretrain_models/spsr.pth, and testing on Set5/Set14/B100/Urban100. I am using the psnr/ssim functions provided in utils\util.py to measure the difference between HR and SR images. I generally find the average PSNR/SSIM on the four databases are much lower than the values reported in the paper. Am I missing something?

I only added a few lines to test.py, and modified the config file test_spsr.json to include HR ground-truth while testing.

test.py:

sr_img = util.tensor2img(visuals['SR']) # uint8
hr_img = util.tensor2img(visuals['HR']) # uint8
psnr = util.calculate_psnr(sr_img, hr_img)
ssim = util.calculate_ssim(sr_img, hr_img)

test_spsr.json:
{
"name": "SPSR",
"model": "spsr",
"scale": 4,
"gpu_ids": [0],
"suffix": "_sr",

"datasets": {
"test_1": {
"name": "set5",
"mode": "LRHR",
"dataroot_LR": "../../datasets/benchmark/Set5/LR_bicubic/X4",
"dataroot_HR": "../../datasets/benchmark/Set5/HR"
},
"test_2": {
"name": "set14",
"mode": "LRHR",
"dataroot_LR": "../../datasets/benchmark/Set14/LR_bicubic/X4",
"dataroot_HR": "../../datasets/benchmark/Set14/HR"
},
"test_3": {
"name": "b100",
"mode": "LRHR",
"dataroot_LR": "../../datasets/benchmark/B100/LR_bicubic/X4",
"dataroot_HR": "../../datasets/benchmark/B100/HR"
},
"test_4": {
"name": "urban100",
"mode": "LRHR",
"dataroot_LR": "../../datasets/benchmark/Urban100/LR_bicubic/X4",
"dataroot_HR": "../../datasets/benchmark/Urban100/HR"
}
},
"path": {
"root": "experiments/pretrain_models",
"pretrain_model_G": "experiments/pretrain_models/spsr.pth"
},

"network_G": {
"which_model_G": "spsr_net",
"norm_type": null,
"mode": "CNA",
"nf": 64,
"nb": 23,
"in_nc": 3,
"out_nc": 3,
"gc": 32,
"group": 1
}
}

Thanks a lot!

checkerboard artifacts and CUDA out of memory

hi,thank you for your nice work
I test only 550*700 image, it happend cuda out of memory,I gpu is 24G.
and some result is very good, but the image of low quanlity has serious artifacts.

Unknown error occuring

`RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable`

hey thanks for your code, I need yor help as I am new to pytorch and not getting what this error is saying , while implementing the test code

Why are the input and output of the gradient branch three channels instead of single?

Hi, Why are the input and output of the gradient branch three channels instead of single?

The model.mat for PI calculate

You are so nice to share the code of metrics evaluating, and I am trying to download the model.mat by baiduyun, but the share is gone, would you mind to re-share it?

than you!

train crash

Traceback (most recent call last):
File "train.py", line 182, in
main()
File "train.py", line 134, in main
model.test()
File "/home/xxx/Repository/SR_project/SPSR/code/models/SPSR_model.py", line 388, in test
self.fake_H_branch, self.fake_H, self.grad_LR = self.netG(self.var_L)
File "/home/xxx/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/xxx/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/xxx/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/xxx/Repository/SR_project/SPSR/code/models/modules/architecture.py", line 191, in forward
x_f_cat = self.f_block(x_f_cat)
File "/home/xxx/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/xxx/Repository/SR_project/SPSR/code/models/modules/block.py", line 229, in forward
out = self.RDB3(out)
File "/home/xxx/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/xxx/Repository/SR_project/SPSR/code/models/modules/block.py", line 206, in forward
x5 = self.conv5(torch.cat((x, x1, x2, x3, x4), 1))
RuntimeError: CUDA out of memory. Tried to allocate 1.61 GiB (GPU 0; 10.73 GiB total capacity; 4.65 GiB already allocated; 1.46 GiB free; 3.77 GiB cached)

您好，我在运行您的代码的时候遇到一个问题，期待您的答复，谢谢。在运行时，可以正常训练，但是验证时，会爆显存RuntimeError:CUDA out of memory。请问是什么原因呢

The SSIM indexes seem irregular on set14 and urban100.

Hi, the SSIM indexes seem too high on the set14 and urban100 datasets.

Especially on the urban100, SSIM=0.95 seems too high given the PSNR is only ~25dB.

It seems that ESRGAN obtains SSIM_Y at about 0.71 and 0.73 on the Set14 and urban100 datasets, respectively.

Please double check those numbers.

关于梯度图

您好,我使用了您生成梯度图的代码,但是生成的梯度图的轮廓并不是只有白色,还有其他颜色的线条,求解答,谢谢!代码如下:

LR_path = '/home/linx/桌面/2020-12-01 19-19-37屏幕截图.png'
img_LR = cv2.imread(LR_path, cv2.IMREAD_UNCHANGED)
img_LR = img_LR.astype(np.float32) / 255.
H, W, C = img_LR.shape
# BGR to RGB, HWC to CHW, numpy to tensor
if img_LR.shape[2] == 3:
    img_LR = img_LR[:, :, [2, 1, 0]]
img_LR = torch.from_numpy(np.ascontiguousarray(np.transpose(img_LR, (2, 0, 1)))).float()
img_LR = torch.unsqueeze(img_LR, 0)
print(img_LR.shape)
get_grad = Get_gradient_nopadding()
gradient_img = get_grad(img_LR)

print(gradient_img.shape)

out = tensor2img(gradient_img)
save_img(out, 'gradient_img.png')

Scale=4

Thanks for your great job first!

I test your trained model with Set5 dataset,the butterfly image with 256 pixels plus 256 pixels can run successfully and achieve extraordinary results,but it do not work when the images have more pixels,for example,for 512 plus 512 pixels' image in my single 2080Ti GPU, it says:RuntimeError: CUDA out of memory. Tried to allocate 3.50 GiB (GPU 0; 10.76 GiB total capacity; 6.27 GiB already allocated; 3.23 GiB free; 419.14 MiB cached).

I want to use your SR model to double my own image with 956 plus 532 pixels.Now I have two questions.First,if I want to double my with 956 plus 532 pixels image,how big should my GPU memory?Second,I want to double my image,but your model is trained for scale=4,so do I have to re-training with DIV2K dataset for scale=2?

I have some questions for training

您好，我按照您的过程创建了训练数据集之后，跑了大概五千步发现val测试的结果总是黑色的，但是训练过程中loss输出都很小，想问下您在训练的时候是大概在多少步的时候出现的视觉效果为彩色的图呢。

About evaluation toolbox configuration.

Hi,
I follow your tips,but I can not still import matlab.engine. I want to know how to configure it by yourself, including all kinds of enviroment settings. I have tried many ways, but no one can succeed. It can not be installed in Python 3.x, only in Python 2.7. Thanks!

Does the gradient branch participate in backpropagation?

Thanks for your work! I am confused , does the gradient branch participate in backpropagation?

Out of memory

Hi. When I set the batch size to 16, the training only takes up about 6G of GPU memory, but when I run to the verification set, the memory overflow error will be reported. My GPU memory is 11G. What is the batch size you set during training and how much memory is used? Moreover, it is very slow to train. It takes a long time to iterate 5000 times. Is there any way to save GPU memory? Thanks.

waring

想请问下训练的迭代次数和时间

您好，我想请问下你们在训练spsr的时候大概是在多少次迭代收敛的，我训练到41万次的时候画出的指标曲线还在振荡，大概训练了多久？

Can replace cat with add or RRDB with RCAN or both?

Can save a lot of memory.

关于使用工具箱测试自己的模型

您好，非常感谢您开源计算指标的工具箱，我有两个问题想请教您：
1）如果我想用您的工具箱，测试我自己的模型，是否需把model.mat换成自己的，还有就是怎么产生model.mat
2）SR和GT都有了，算指标的时候不是应该和模型无关了嘛，为啥要下载这个model.mat，这个model.mat的作用是？
期待收到您的回复，再次感谢作者开源这么棒的工具箱！

模型的训练是如何安排的？

你好呀~我有几个问题想请教你：

想重新训练你的模型看看不采用对抗训练的方式（只训练生成器），效果是如何的？你们有尝试过只训练生成器部分吗？
现在文章的训练方式是不是预训练5000轮，再对抗训练5e6轮？
在第二问的基础上，文章提到用了一个PSNR导向的模型作为预训练模型，请问在预训练中，是不是只有L1或L2作为模型的loss？
如果只要训练生成器，应该怎么设置预训练和正式训练的轮次？

训练时预处理的lmdb部分报错

您好，训练时报错AssertionError: /root/SPSR-master/code/data/dataset/DIV2K800_sub_HR.lmdb/ has no valid image file
.lmdb文件夹中只有三个文件，也确实没有图像文件，是按步骤设置的请问是哪里出错了，期待您的回答

I am running out of memory on gpu

Any ideas what I am doing wrong?

(I have provided set14 -> Set14/LRbicx4 and Set14/original for val parameter in options)
I also commented out pretrain_model_G in options (RRDB_PSNR_x4.pth - I do not have this)

20-12-26 19:11:14.119 - INFO: Random seed: 9
20-12-26 19:11:14.124 - INFO: Dataset [LRHRDataset - DIV2K] is created.
20-12-26 19:11:14.124 - INFO: Number of train images: 900, iters: 30
20-12-26 19:11:14.125 - INFO: Total epochs needed: 16667 for iters 500,000
20-12-26 19:11:14.125 - INFO: Dataset [LRHRDataset - Set14] is created.
20-12-26 19:11:14.125 - INFO: Number of val images in [Set14]: 14
20-12-26 19:11:14.275 - INFO: Initialization method [kaiming]
20-12-26 19:11:16.317 - INFO: Initialization method [kaiming]
20-12-26 19:11:16.482 - INFO: Initialization method [kaiming]
20-12-26 19:11:17.393 - WARNING: Params [module.get_g_nopadding.weight_h] will not optimize.
20-12-26 19:11:17.393 - WARNING: Params [module.get_g_nopadding.weight_v] will not optimize.
20-12-26 19:11:17.397 - INFO: Model [SPSRModel] is created.
20-12-26 19:11:17.397 - INFO: Start training from epoch: 0, iter: 0
Traceback (most recent call last):
  File "train.py", line 182, in <module>
    main()
  File "train.py", line 105, in main
    model.optimize_parameters(current_step)
  File "/home/joe/prj/SPSR/code/models/SPSR_model.py", line 251, in optimize_parameters
    self.fake_H_branch, self.fake_H, self.grad_LR = self.netG(self.var_L)
  File "/home/joe/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/joe/.local/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 159, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/joe/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/joe/prj/SPSR/code/models/modules/architecture.py", line 191, in forward
    x_f_cat = self.f_block(x_f_cat)
  File "/home/joe/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/joe/prj/SPSR/code/models/modules/block.py", line 229, in forward
    out = self.RDB3(out)
  File "/home/joe/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/joe/prj/SPSR/code/models/modules/block.py", line 206, in forward
    x5 = self.conv5(torch.cat((x, x1, x2, x3, x4), 1))
RuntimeError: CUDA out of memory. Tried to allocate 480.00 MiB (GPU 0; 10.76 GiB total capacity; 6.11 GiB already allocated; 274.69 MiB free; 6.97 GiB reserved in total by PyTorch)
make: *** [Makefile:2: all] Error 1

Sometimes I see this after a few epochs.

Is it normal for l_d_real and l_d_fake keep 0 during the first several epochs

the val image results show black for all pixels.
I followed your code, and parameters are set by default.

20-04-26 15:14:21.084 - INFO: <epoch: 4, iter: 4,000> psnr: 5.9762e+00
20-04-26 15:16:38.846 - INFO: <epoch: 4, iter: 4,100, lr:1.000e-04> l_g_pix: 4.7353e-03 l_g_fea: 1.9221e+00 l_g_gan: 1.0035e-01 l_g_pix_grad_branch: 3.4439e-02 l_d_real: 0.0000e+00 l_d_fake: 0.0000e+00 l_d_real_grad: 0.0000e+00 l_d_fake_grad: 0.0000e+00 D_real: 2.8926e+01 D_fake: 8.8556e+00 D_real_grad: 3.2620e+01 D_fake_grad: -8.0358e+00
20-04-26 15:18:57.880 - INFO: <epoch: 4, iter: 4,200, lr:1.000e-04> l_g_pix: 4.5557e-03 l_g_fea: 2.1639e+00 l_g_gan: 7.4714e-02 l_g_pix_grad_branch: 5.0381e-02 l_d_real: 1.7411e-06 l_d_fake: 1.0290e-06 l_d_real_grad: 0.0000e+00 l_d_fake_grad: 0.0000e+00 D_real: 2.9009e+01 D_fake: 1.4066e+01 D_real_grad: 2.9313e+01 D_fake_grad: -7.0066e+00
20-04-26 15:21:17.073 - INFO: <epoch: 5, iter: 4,300, lr:1.000e-04> l_g_pix: 4.5929e-03 l_g_fea: 2.1910e+00 l_g_gan: 8.5829e-02 l_g_pix_grad_branch: 4.4242e-02 l_d_real: 1.3489e-07 l_d_fake: 2.9489e-07 l_d_real_grad: 0.0000e+00 l_d_fake_grad: 0.0000e+00 D_real: 2.9877e+01 D_fake: 1.2711e+01 D_real_grad: 3.1373e+01 D_fake_grad: -4.3015e+00
20-04-26 15:23:38.389 - INFO: <epoch: 5, iter: 4,400, lr:1.000e-04> l_g_pix: 4.3075e-03 l_g_fea: 2.0604e+00 l_g_gan: 1.2741e-01 l_g_pix_grad_branch: 4.0523e-02 l_d_real: 0.0000e+00 l_d_fake: 0.0000e+00 l_d_real_grad: 0.0000e+00 l_d_fake_grad: 0.0000e+00 D_real: 2.5259e+01 D_fake: -2.2342e-01 D_real_grad: 3.0326e+01 D_fake_grad: -3.8803e+00
20-04-26 15:25:56.332 - INFO: <epoch: 5, iter: 4,500, lr:1.000e-04> l_g_pix: 4.4438e-03 l_g_fea: 1.9418e+00 l_g_gan: 1.1235e-01 l_g_pix_grad_branch: 4.0387e-02 l_d_real: 0.0000e+00 l_d_fake: 0.0000e+00 l_d_real_grad: 0.0000e+00 l_d_fake_grad: 0.0000e+00 D_real: 2.7480e+01 D_fake: 5.0108e+00 D_real_grad: 3.3779e+01 D_fake_grad: -5.4768e+00
20-04-26 15:26:01.084 - INFO: # Validation # PSNR: 5.8998e+00
20-04-26 15:26:01.094 - INFO: <epoch: 5, iter: 4,500> psnr: 5.8998e+00
20-04-26 15:28:28.558 - INFO: <epoch: 5, iter: 4,600, lr:1.000e-04> l_g_pix: 4.0463e-03 l_g_fea: 2.3505e+00 l_g_gan: 7.1082e-02 l_g_pix_grad_branch: 4.0911e-02 l_d_real: 8.7050e-06 l_d_fake: 5.9227e-06 l_d_real_grad: 0.0000e+00 l_d_fake_grad: 0.0000e+00 D_real: 2.5983e+01 D_fake: 1.1766e+01 D_real_grad: 3.0323e+01 D_fake_grad: -7.1722e+00

thank you

PSNR/SSIM scores of SOA models

Did you retrain the SRGAN. ESRGAN models on your data-set? Because the PSNR/SSIM scores of the original papers are higher than mentioned here.

Test with multiple GPUs

Hi.When the test uses multiple GPUs, only one GPU is used. How to solve it? Thanks.