Code Monkey home page Code Monkey logo

carn-pytorch's People

Contributors

nmhkahn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

carn-pytorch's Issues

activation before output

Why is there no activation function before the output? How to guarantee the output data range?

training slow down

I run the training code on 2 gpu, and I found that the training time increase about 7s every 1000 steps. And I tried adding torch.cuda.empty_cache() every 1000 steps, but it doesn't help. Is there any solution for that?

Thanks.

Not able to understand DataLoader part in solver.py file

The definitions for self.train_loader is defined as follows in the file:

self.train_data = TrainDataset(cfg.train_data_path, scale=cfg.scale, size=cfg.patch_size)

self.train_loader = DataLoader(self.train_data, batch_size=cfg.batch_size, num_workers=1, shuffle=True, drop_last=True)

def fit(self):
        cfg = self.cfg
        refiner = nn.DataParallel(self.refiner, device_ids=range(cfg.num_gpu))
        
        learning_rate = cfg.lr
        while True:
            for inputs in self.train_loader:
                self.refiner.train()

                if cfg.scale > 0:
                    scale = cfg.scale
                    hr, lr = inputs[-1][0], inputs[-1][1]

Suppose I give batch_size = 10 in the DataLoader, then in the fit() function, in each iteration of the for loop, the variable inputs while containing data of 10 image pairs. But then the code seems to be taking only 1 pair out of the entire batch of 10 image pairs. Am I missing something here?

Calculate model parameters

Excuse me, I want to compare with your method in my paper. I use some tools in github to calculate the model parameters, multi-Add, Flops. But I cannot get the same result. Can you give me some detailed information about calculating model parameters? thanks

Out of memory error arise when test pretrained models

Thank you for your creative work.But when I was testing pretrained models with an image(width:960px, height:1707px),out of memory error arise. This is confusing because my memory should be enough.
The args are as follows.
"cuda":false,
"group":1,
"scale":2,
"shave":20
I don't know if it's reasonable. Did I have the wrong operation?

CUDA out of memory during training

I'm getting this RuntimeError during the training process using the following command

python carn\train.py --patch_size 32 --batch_size 32 --max_steps 600000 --print_interval 50 --decay 400000 --model carn --ckpt_name carn_test --ckpt_dir checkpoint\carn --scale 2
Traceback (most recent call last):
  File "carn\train.py", line 52, in <module>
    main(cfg)
  File "carn\train.py", line 48, in main
    solver.fit()
  File "C:\Users\Omar\Desktop\CARN-pytorch\carn\solver.py", line 95, in fit
    psnr = self.evaluate("dataset/Urban100", scale=cfg.scale, num_step=self.step)
  File "C:\Users\Omar\Desktop\CARN-pytorch\carn\solver.py", line 136, in evaluate
    sr = self.refiner(lr_patch, scale).data
  File "C:\Program Files\Python37\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\Omar\Desktop\CARN-pytorch\carn\model\carn.py", line 74, in forward
    b3 = self.b3(o2)
  File "C:\Program Files\Python37\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\Omar\Desktop\CARN-pytorch\carn\model\carn.py", line 30, in forward
    c3 = torch.cat([c2, b3], dim=1)
RuntimeError: CUDA out of memory. Tried to allocate 195.25 MiB (GPU 0; 4.00 GiB total capacity; 2.88 GiB already allocated; 170.14 MiB free; 2.00 MiB cached)

OSError

Hi,nmhkahn.
when i run " python carn/train.py --patch_size 64 --batch_size 64 --max_steps 600000 --decay 400000 --model carn --ckpt_name carn --ckpt_dir checkpoint/carn --scale 0 --num_gpu 2"
I got an error about "OSError: Unable to open file (bad object header version number)"
So how can i change the code?I need your help thank you!

pytorch 模型转ONNX模型问题请教

下面是我的转换代码,只是训练是单GPU训练。
"""
This code is used to convert the pytorch models into an onnx format models.
"""
import torch.onnx
from collections import OrderedDict
from model import carn_m
import os
import json
import time
import importlib
import argparse
import numpy as np
from collections import OrderedDict
import torch
import torch.nn as nn
from torch.autograd import Variable
from dataset import TestDataset
from PIL import Image

def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument("--model", type=str)
parser.add_argument("--ckpt_path", type=str)
parser.add_argument("--group", type=int, default=1)
parser.add_argument("--scale", type=int, default=4)

return parser.parse_args()

def main(cfg):
module = importlib.import_module("model.{}".format(cfg.model))
net = module.Net(multi_scale=True,
group=cfg.group).cuda()
print(json.dumps(vars(cfg), indent=4, sort_keys=True))
net.load_state_dict(torch.load(cfg.ckpt_path))
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
net = net.to(device)

#转换后模型保存路径
model_path = f"../checkpoint/onnx/carn.onnx"

dummy_input = torch.randn(1, 3, 256, 256).to("cuda")
torch.onnx.export(net, dummy_input, model_path, export_params=True, 
                      keep_initializers_as_inputs=True, # store the trained parameter weights inside the model file
                      opset_version=10,  # the ONNX version to export the model to
                      do_constant_folding=True,  # whether to execute constant folding for optimization
                      verbose = True,
                      input_names=['input'],
                      `output_names=['enhance_image'])`

if name == "main":
cfg = parse_args()
main(cfg)

出现下面报错:
(base) hejing@tonly-Super-Server:~/TL_project/CARN-pytorch$ python carn/pytorch_to_onnx.py --model carn_m --scale 2 --ckpt_path ./checkpoint/carn_m_2000.pth --group 4
{
"ckpt_path": "./checkpoint/carn_m_2000.pth",
"group": 4,
"model": "carn_m",
"scale": 2
}
Traceback (most recent call last):
File "carn/pytorch_to_onnx.py", line 53, in
main(cfg)
File "carn/pytorch_to_onnx.py", line 48, in main
output_names=['enhance_image'])
File "/home/hejing/.local/lib/python3.6/site-packages/torch/onnx/init.py", line 148, in export
strip_doc_string, dynamic_axes, keep_initializers_as_inputs)
File "/home/hejing/.local/lib/python3.6/site-packages/torch/onnx/utils.py", line 66, in export
dynamic_axes=dynamic_axes, keep_initializers_as_inputs=keep_initializers_as_inputs)
File "/home/hejing/.local/lib/python3.6/site-packages/torch/onnx/utils.py", line 416, in _export
fixed_batch_size=fixed_batch_size)
File "/home/hejing/.local/lib/python3.6/site-packages/torch/onnx/utils.py", line 279, in _model_to_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args, training)
File "/home/hejing/.local/lib/python3.6/site-packages/torch/onnx/utils.py", line 236, in _trace_and_get_graph_from_model
trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(model, args, _force_outplace=True, _return_inputs_states=True)
File "/home/hejing/.local/lib/python3.6/site-packages/torch/jit/init.py", line 277, in _get_trace_graph
outs = ONNXTracedModule(f, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
File "/home/hejing/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/hejing/.local/lib/python3.6/site-packages/torch/jit/init.py", line 360, in forward
self._force_outplace,
File "/home/hejing/.local/lib/python3.6/site-packages/torch/jit/init.py", line 347, in wrapper
outs.append(self.inner(*trace_inputs))
File "/home/hejing/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 530, in call
result = self._slow_forward(*input, **kwargs)
File "/home/hejing/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 516, in _slow_forward
result = self.forward(*input, **kwargs)
**TypeError: forward() missing 1 required positional argument: 'scale'**
这是什么原因?

Typo readme.md

Thanks for your work.
And i found the typo in readme.md (--model), please modify it.

For CARN-M
$ python carn/train.py --patch_size 64
--batch_size 64
--max_steps 600000
--decay 400000 \

--model carn \ -> --model carn_m \

--ckpt_name carn_m
--ckpt_dir checkpoint/carn_m
--scale 0
--group 4
--num_gpu 2

MeanShift(nn.Module):

Thank you for the code you provided. I am new at SR. I have a question when reading the code.why do we do MeanShift processing? What is the significance?

class MeanShift(nn.Module):
def init(self, mean_rgb, sub):
super(MeanShift, self).init()

    sign = -1 if sub else 1
    r = mean_rgb[0] * sign
    g = mean_rgb[1] * sign
    b = mean_rgb[2] * sign

    self.shifter = nn.Conv2d(3, 3, 1, 1, 0)
    self.shifter.weight.data = torch.eye(3).view(3, 3, 1, 1)  
    self.shifter.bias.data = torch.Tensor([r, g, b])

    # Freeze the mean shift layer
    for params in self.shifter.parameters():
        params.requires_grad = False

def forward(self, x):
    x = self.shifter(x)
    return x

About MultAdds

Thank you for your work, I want to know that how to calculate the MultAdds in your paper.

Error(s) in loading state_dict for Net: Missing key(s) in state_dict

Hi, thanks for your codes.
I trained a checkpoint model named carn_m_180000.pth, but when I testing it by sample.py, I got following error, could you give me some suggestions?

$ python3 carn/sample.py --model=carn_m --ckpt_path=./checkpoint/carn_m/carn_m_180000.pth --sample_dir=./sample/
using this model
{
    "ckpt_path": "./checkpoint/carn_m/carn_m_180000.pth",
    "cuda": false,
    "group": 1,
    "model": "carn_m",
    "sample_dir": "./sample/",
    "scale": 2,
    "shave": 20,
    "test_data_dir": "dataset/DIV2K_valid_LR_bicubic/X2"
}
Traceback (most recent call last):
  File "carn/sample.py", line 122, in <module>
    main(cfg)
  File "carn/sample.py", line 110, in main
    net.load_state_dict(new_state_dict)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 719, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Net:
	Missing key(s) in state_dict: "upsample.up2.body.0.bias", "upsample.up2.body.0.weight", "upsample.up3.body.0.bias", "upsample.up3.body.0.weight", "upsample.up4.body.0.bias", "upsample.up4.body.0.weight", "upsample.up4.body.3.bias", "upsample.up4.body.3.weight". 
	Unexpected key(s) in state_dict: "upsample.up.body.0.weight", "upsample.up.body.0.bias". 

Multi-scale Training

Hi,

In your solver I found following comment about the multiscale training.
# only use one of multi-scale data

So you switch randomly between the scale, correct? Can you elaborate what would be a better idea?

Thx a lot for sharing your code!

About the multi scale train?

Sir,

How can I train the multi scale model? I just generate the LR(X2,X3,X4) using matlab, and then, I generate the *.h5.

How does each iteration in fitting fetch the train data examples?

I found(if cfg.scale==0) :

72 # only use one of multi-scale data
73 # i know this is stupid but just temporary
74 scale = random.randint(2, 4)
75 hr, lr = inputs[scale-2][0], inputs[scale-2][1]

Does it mean this will get a batch(batchsize=64) at a random scale?

thank you very much!

Preliminary Suggestions on Model Configuration

Thanks for sharing your implementation on the model. The paper seems to suggest the model is used on mobile devices. But I assume it is used in standard desktops/laptops and add more features. I have very limited GPU, so the following experimental results are preliminary and may well be wrong.

  1. Append Concurrent Spatial and Channel Squeeze & Excitation after each residual blocks;

  2. upscale the input image and add it to the final convolution output;

  3. replace all 1x1 convolution by 3x3 with groups. 4 groups have shown clear improvement.

img

img

The standard blue line is your model configuration. "se" means "Concurrent Spatial and Channel Squeeze & Excitation", and "res" means residual add on up-scaled input image. "group1" means all 1x1 conv are replaced by 3x3.

My GPU doesn't have enough memory for validation test while training, so I only check the L1 loss. You may find more details here.

Hope they are useful.

Please tell me if I am wrong. Thanks.

The use of file---.pth

Dear nmhkahn:
When I run the train.py , it can produce amounts of .pth in the director checkpoint. However, when I run the sample.py , I set the "ckpt_pth" as the "carn.pth"(as you write in the readme.md ). I wanna say, I don't use the result of train.py , because it will cause some errors like " errors in loading state_dict for Net".
Thank you ,looking forward your reply.

can not achieve psnr as the paper given

@nmhkahn Thank you for Release codes, however, when test pretrained model that you released, the psnr and ssim results are lower than the paper, for example, Set5, it achieved average psrn 29.665, ssim 0.854, could you offer you testing configuration?

follow is the testing codes that I add in sample.py after 102 line
hr_ = hr.cpu().mul(255).clamp(0, 255).byte().permute(1, 2, 0).numpy()
sr_ = sr.cpu().mul(255).clamp(0, 255).byte().permute(1, 2, 0).numpy()
im1 = hr_
im2 = sr_
v1 = psnr(im1,im2)
v2 = ssim(im1,im2)
psnrs += v1
ssims += v2
count += 1

    print('psnr: %.3f, ssim: %.3f'%(v1,v2))
print('avg psnr: %.3f, avg ssim: %.3f' % (psnrs/count, ssims/count))

Confused with "--sample" params and can't find any outputs during testing

Thank you for sharing this fast and tidy code!

But I am slightly confused with "--sample" params and can't find any outputs during testing

Here is what I command and the output:

(pytorch0.4.0) user_name@server_name:~/CARN$ python carn/sample.py --model carn --test_data_dir dataset/DIV2K/DIV2K_valid_LR_bicubic/X2 --scale 2 --ckpt_path checkpoint/carn.pth
{
"ckpt_path": "checkpoint/carn.pth",
"cuda": false,
"group": 1,
"model": "carn",
"sample_dir": null,
"scale": 2,
"shave": 20,
"test_data_dir": "dataset/DIV2K/DIV2K_valid_LR_bicubic/X2"
}

What shall I do with this sisuation? Thank you for your help!

Retraining accuracy is low

I train your model on DIV2K, and for urban100 X4, PSNR is only 24.5247 and SSIM is only 0.7617.
We test PSNR and SSIM with NTIRE2017_scoring_functions.
And the PSNR and SSIM of your provided results are aslo low which is respectively 24.5549 and 0.7638.

Question about shave

why you add the shave when you calculate the psnr?

h_chop, w_chop = h_half + cfg.shave, w_half + cfg.shave

I hope to get your answer.thx

Different Multi-Adds on CARN-M

Hi,
I use TensorSummaryX to calculate the multiadds of your proposed CARN-M. However, the results are different from paper. The results of CARN are exactly same.

For example, the x4 multi-adds I calculated is 63G, but the paper's result is 32.5G

index out

IndexError: Traceback (most recent call last):
File "/home/zyc/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 106, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/zyc/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 106, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/zyc/tensorflow/SR分析/CARN/CARN-pytorch/carn/dataset.py", line 66, in getitem
item = [(self.hr[index], self.lr[i][index]) for i, _ in enumerate(self.lr)]
File "/home/zyc/tensorflow/SR分析/CARN/CARN-pytorch/carn/dataset.py", line 66, in
item = [(self.hr[index], self.lr[i][index]) for i, _ in enumerate(self.lr)]
IndexError: list index out of range
@nmhkahn I don't know what to do. Can you help me

multi-scale training and single scale training

is the single training can achived a better result than multi scale?
and I do the multi scale training, but i think when 6×105 steps, the PSNR is still increasing?
I would like to know the setting mentioned in the paper is for single training or multi scale training?
i am a rookie.
thank you!

about training datasets

what's the training dataset? there are 291 images, Berkeley Segmentation, and DIV2K in the paper, but the git only has DIV2K, thanks.

question about the network's parameters

Recently,I ran the code as the setting you gave.But something wrong here.So, can you tell me why the CARN's parameters I ran were 1112K with the scale set 4,and the CARN-M's parameters were 294K with the scale set 4 and the group set 4.I never change the codes.And I ran it with the settings you put in github.I am so confused.
It will be very grateful if you answer me this question.Hoping for your reply.

Group convolution in CARN

Hello there
You have proposed two models, CARN and CARN-M.
Figures 2 and 3 of your paper, and paragraph 3 of the paper, show that the residual block in your CARN uses residual-E. But in your code, only CARN-M uses group convolution. The residual module used by CARN is just a normal convolution rather than a group convolution. But the last paragraph of Section 4.4 of your paper mentions that the PSNR of CARN is 28.7. I don't understand this value, it is not found in Table 1. So, is your CARN model the same as described in the code?

Split into sub images(path) for inference?

Sir,

When I test DIV2K, I think it is a valid way to split path ,and run SR for each patch independently.

otherwise, it will be out of memory.

my question is that, is this a common using way in SR area? will it cause some artifact like vertical or horizon line?

thank you very much,sir!

Not able to train the model

I am not able to train the model. The command used:

python carn/train.py --patch_size 64 --batch_size 64 --max_steps 600000 --decay 400000 --model carn --ckpt_name carn --ckpt_dir checkpoint/carn

And this is the output:

{ "batch_size": 64, "ckpt_dir": "checkpoint/carn", "ckpt_name": "carn", "clip": 10.0, "decay": 400000, "group": 1, "loss_fn": "L1", "lr": 0.0001, "max_steps": 600000, "model": "carn", "num_gpu": 1, "patch_size": 64, "print_interval": 1, "sample_dir": "sample/", "scale": 2, "shave": 20, "train_data_path": "dataset/DIV2K_train.h5", "verbose": "store_true" } of params: 964187 /home/workspace/sisr/CARN-pytorch/carn/solver.py:85: UserWarning: torch.nn.utils.clip_grad_norm is now deprecated in favor of torch.nn.utils.clip_grad_norm_. nn.utils.clip_grad_norm(self.refiner.parameters(), cfg.clip)

After this nothing happens, it seems to be stuck. The process is still running and memory is allocated.

Any ideas what the problem could be?

Mobile version

I was curious if anyone has managed to transfer the CARN-M model to a mobile phone, like Android? I guess Tensorflow Mobile or Tensorflow Lite might be available tools. Is it doable?

Question about the structure of Residual-E Block

hi.
The structure of Residual-E block was evolved from Residual Block in your paper.
Why did not arrange one more 1x1 conv between the 1st Relu and the 2nd Group Conv, only arranged one 1x1 Conv at the end of the block?

Trying to reproduce your results

Hi there,

Thank you very much for releasing this code!

I'm trying to reproduce your results. However, I guess I'm missing something...

Training config is:
{
    "batch_size": 64,
    "ckpt_dir": "checkpoint/carn_baseline",
    "ckpt_name": "carn_baseline",
    "clip": 10.0,
    "decay": 400000,
    "group": 1,
    "loss_fn": "L1",
    "lr": 0.0001,
    "max_steps": 600000,
    "model": "carn",
    "num_gpu": 1,
    "patch_size": 64,
    "print_interval": 1000,
    "sample_dir": "sample/",
    "scale": 0,
    "shave": 20,
    "train_data_path": "dataset/DIV2K_train.h5",
    "verbose": true
}
# of params: 1591963

On DIV2K Bicubic. Did you use bicubic Downscaling or unknown dowgrading operators?

After 575k Iteration on one single Titan X, I could only achieve following results on Urban100:

  • x2: 30.31
  • x3: 26.52
  • x4: 24.57
    which is kind of far from the paper results :-(

Is it just bad luck with the initialization or do I miss something important?

Btw, I noticed that I can fit Batch 64 / Patch 64 on one single Titan X. When I use two, the second one loads only about 600Mb Memory. Is that a normal behavior?

Thanks a lot for your help!

Train Loader

Hi @nmhkahn,

I'm reading your code line by line to understand better and have a question about the length of self.train_loader() in the solver.py line 36.

In the line below I checked the length and its 800(since we have 800 images)

self.train_data = TrainDataset(cfg.train_data_path, 
                                       scale=cfg.scale, 
                                       size=cfg.patch_size)

But after loading your self.train_data with DataLoader the length of self.train_loader become 25. I am curious why 25?

self.train_loader = DataLoader(self.train_data,
                                       batch_size=cfg.batch_size,
                                       num_workers=1,
                                       shuffle=True, drop_last=True)

Thanks in advance!

License?

Hi, what's the license of this code?
Thank you.

Aboud the learning rate?

Sir,

Should it be:

176 def decay_learning_rate(self):
177 lr = self.cfg.lr * (0.5 ** (self.step / self.cfg.decay))
178 return lr

or:

176 def decay_learning_rate(self):
177 lr = self.cfg.lr * (0.5 ** (self.step // self.cfg.decay))
178 return lr

thank you very much!

eg:

110 // 130 =0
110 / 130 = 0.8461538461538461

Get pre-trained model PSNR is lower than paper

Hello!
I use command
python carn/sample.py --model carn --test_data_dir dataset/Urban100 --scale 2 --ckpt_path ./checkpoint/carn.pth --sample_dir Urban100carn
to get SR images of pre-trained model you offered in checkpoint.
Than use matlab code https://github.com/jbhuang0604/SelfExSR/blob/master/quant_eval
as you mentioned in other issue, but I got PSNR is 31.91, your paper result is 31.92(Urban100 X2).
Does I miss something? Please point to me, thank you very much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.