myungsub / cain Goto Github PK

View Code? Open in Web Editor NEW

318.0 7.0 43.0 6.14 MB

Source code for AAAI 2020 paper "Channel Attention Is All You Need for Video Frame Interpolation"

License: MIT License

Python 98.36% Shell 1.64%

video-frame-interpolation frame-interpolation channel-attention deep-convolutional-networks aaai2020 dataset pytorch

cain's Introduction

🤖 Myungsub Choi / @myungsub

SAIT (Samsung Advanced Institute of Technology) (2022.05 ~) : Camera 3A, ISP
Google Research (2021.04 ~ 2022.04) : Referring Object Manipulation (ECCV'22)
SNU CVLab (Ph.D.) (2013 ~ 2021) : Video frame interpolation, Super-resolution

📡 Links

📖 Facebook : https://fb.com/c.myungsub
🕸 Website : https://myungsub.github.io

cain's People

Contributors

Stargazers

Watchers

Forkers

ammieqi liuguoyou aihill rogalag scape1989 clynie esw0116 calculusoflambdas gengcauwong killsking daisily727 dwhou c00renut hityzy1122 nindoumon nihui ywu40 bradsegal cv-ip heylonnhp danier97 whjzsy hubertsotnowski hackerhenryiscomming lv-tuan joshuaword2alt inoculate23 xiajiufan haiyang36 shuowang-ai zzxihuanheixiu claudiuvintila weizhang-indi solarmagic elijahahianyo xiaoxuan-val alexwang1900 saber5433 namjiii boldaslove aicodedev kkeenee athy125

cain's Issues

Question about the Figure 5 (CA visualization) in paper

Hi, I'm interested about the Figure 5, Visualization of internal feature maps with their channel attentions, in your paper.

Could you please tell me more details about how you calculate the activation maps for channels? Or pubilish the relevant code about the activation map calculation.

Thanks

Add License

This is a great interpolation method.
Can you please add a License to the project?

I have a question about the def InOutPaddings(x), why should the width and height be padded to a multiple of 128?
I change this number to 32, and the PSNR on Vimeo90k can be improved to 34.76dB (since 256 and 448 are already multiples of 32, no padding is actually done).

def InOutPaddings(x):
    w, h = x.size(3), x.size(2)
    padding_width, padding_height = 0, 0
    if w != ((w >> 5) << 5):
        padding_width = (((w >> 5) + 1) << 5) - w
    if h != ((h >> 5) << 5):
        padding_height = (((h >> 5) + 1) << 5) - h
    paddingInput = nn.ReflectionPad2d(padding=[padding_width // 2, padding_width - padding_width // 2,
                                               padding_height // 2, padding_height - padding_height // 2])
    paddingOutput = nn.ReflectionPad2d(padding=[0 - padding_width // 2, padding_width // 2 - padding_width,
                                                0 - padding_height // 2, padding_height // 2 - padding_height])
    return paddingInput, paddingOutput

Interpolation error

When I tried to test the program with test_custom.sh I get this error:

Namespace(batch_size=64, beta1=0.9, beta2=0.99, cuda=True, data_dir='data', data_root='data/frame_seq', dataset='custom', depth=3, exp_name='CAIN_fin', fix_loaded=False, img_fmt='jpg', log_dir='logs', log_iter=20, loss='1*L1', lpips=False, lr=0.0001, max_epoch=200, mode='test', model='cain', n_resblocks=12, num_frames=3, num_gpu=1, num_workers=5, random_seed=12345, resume=True, resume_exp=None, start_epoch=0, test_batch_size=8, test_mode='hard', up_mode='shuffle', use_tensorboard=False, val_batch_size=4, viz=False)
Building model: CAIN

of parameters: 42780432

Evaluating for epoch = 175
[0] images ready to be loaded
0it [00:00, ?it/s]
Traceback (most recent call last):
File "generate.py", line 124, in
main(args)
File "generate.py", line 120, in main
test(args, args.start_epoch)
File "generate.py", line 108, in test
print('im_processed: {:d}/{:d} {:.3f}s \r'.format(i + 1, len(test_loader), time.time() - t))
UnboundLocalError: local variable 'i' referenced before assignment

I have all the packages installed with their respective versions.

FileNotFoundError: [Errno 2] No such file or directory: 'data/vimeo_triplet/sequences//im1.png' thank you

Why do I get an error using data set vimeo_triplet ：FileNotFoundError: [Errno 2] No such file or directory: 'data/vimeo_triplet/sequences//im1.png' thank you

question about Middlebury dataset

Hi， how did you get the result of Middlebury dataset? I have got the same results of your pertrained model on Vimeo90K and UCF101 datasets, but got a poor result on Middlebury. Could you please provide some details?

About Pixel Shuffle

It is very interesting that you use Pixel Shuffle and Channel Attention for motion estimation without estimating optical flow.

I want to ask that in the paper you said that using Pixel Shuffle to maintain the large receptive field, so I want to ask how PS can do that.

One more question, in VFI, I usually see that people will use again the input images to reconstruct the color for the middle. So how just by applying Up Shuffle you can synthesize the middle frame?

Thank you.

Feature map attention score

Hi, thanks for your work, it is pretty interesting. I have a question about the feature maps in figure 5. I know to obtain the feature maps you'd use hooks, but how do you manage to obtain the attention scores for the feature maps?
Thanks again

Why the picture is still blurry after training 20 epoch

I want to replicate your algorithm with the Vimeo90K dataset, but why I trained 20 EPOCH of PSNR to only 15 is almost the same as the PSNR trained by the first EPOCH，I see that the maximum epoch setting on the source code is 200, and the rest of the parameters have not been modified, can it be similar to your results after training 200 epoch? I would appreciate it if you could answer it.

how to interpolate a frame at an arbitray time

Hey, buddy, I like your model so much after I tried some video samples.
This is the STATE-OF-THE-ART model and an amaaaaazing work!!!! And, you are genious, buddy.
Recently, I was wondering how to interpolate a frame at an arbitray time like t=0.2. Unlike optical flow methods, kernel-based methods are only able to interpolate a single frame at t=0.5(t=0.25, t=0.75......).
Do you think is it possible to take a temporal variable t into the model and train it? I am looking forward to your answer.

Download link for SNU-FILM dataset is no longer working

how to convert the model to onnx？

when i want to convert the model to the onnx , but I can‘’t find the input and output name of the model which needed by torch.onnx.export(), can you help me? how to find the input and the output name? thanks !

questions about PRETAINED MODEL

hello author, I found that your PRETRAINED MODEL link is not founded now, and I wonder if you could update it. Thanks

Runtime error on test_custom.sh

I download the model, upload 2 images to data/frame_seq (I use Colab) and run test_custom.sh, but get the following error. What am I doing wrong? Could you help?
RuntimeError: Given groups=1, weight of size [192, 384, 3, 3], expected input[1, 128, 144, 240] to have 384 channels, but got 128 channels instead

norm layer not be used

Hello author, why the norm layer (IN or BN) not be used in the backbone?

SNU-FILM Dadasets download?

According to the data download link provided, why can't the download succeed? The compressed package is damaged. After repair, the data set is incomplete. Can you provide a new download link? Thanks

loss exploded

Hey, buddy, it is a amazing work! I was training the model on Vimeo90k dataset by running ./run.sh.
The loss gradually declined as the epoches increased. But after about 10 epoches passed, the loss exploded suddenly without any sign.
It printed like this

if loss.data.item() > 10.0 * LOSS_0:
print(max(p.grad.data.abs().max() for p in model.parameters()))
continue

And the generated test image

Why the loss exploded suddenly? How can I avoid it?

Thx!

PSNR calculation

Hi,

I observed that you clip the values between [0,1] before calculating the PSNR.

CAIN/utils.py

Line 185 in 09859b2

return img.mul(255 / rgb_range).clamp(0, 255).round()

But shouldn't you be calculating PSNR on the raw pixel output values, since clipping will score good on lot of outlier pixels? Prior works calculate PSNR on raw pixels. (https://github.com/avinashpaliwal/Super-SloMo/blob/bae5375b7b4c9ae4756424fc27b14cce358fe6b3/train.py#L184 ).

Will both the outputs be same?

About exploding gradient

Hello, the code has exploding gradient problem in the vimeo90k dataset trained to 10epoch. If I start retraining from model_best.pth it will result in CUDA out of memory.
Q1. I have set both batch_size and test_batch_size to 8. Will this affect the final result?
Q2. Is there any other solution?
I would be grateful if you could reply!

Training issue on Vimeo Dataset

Hi, thanks for sharing the code. I tried to train on Vimeo dataset from scratch just use your script but seems it stucks at PSNR around 15 and SSIM around 0.5. Is there any special recipe that I am mising in the training process?

I don't know where the output is generated.

Hi, @myungsub

님의 코드를 가지고 프레임 인터폴레이션을 해보았는데,
코드는 GPU를 써가며 잘 돌아가는데
결과물이 어디에 생성되는지를 모르겠습니다.
결과물은 단일 비디오 파일인가요?
아니면 결과물도 이미지 시퀀스 인가요?
궁금합니다.

사용한 명령은 test_custom.sh 입니다.

건투를 빌며...

I tried frame interpolation with your code,
The code runs fine using the GPU.
I don't know where the output is generated.
Is the result a single video file?
Or is the result also an image sequence?
I wonder.

Best of luck...
@bemoregt.