mcg-nku / amt Goto Github PK

View Code? Open in Web Editor NEW

211.0 6.0 17.0 15.21 MB

Official code for "AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation" (CVPR2023)

Home Page: https://nk-cs-zzl.github.io/projects/amt/index.html

License: Other

Python 99.75% Shell 0.25%

video backward-warp cvpr2023 frame-interpolation optical-flow slomo video-generation

amt's Introduction

AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation

This repository contains the official implementation of the following paper:

AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation
Zhen Li^*, Zuo-Liang Zhu^*, Ling-Hao Han, Qibin Hou, Chun-Le Guo, Ming-Ming Cheng
(* denotes equal contribution)
Nankai University
In CVPR 2023

[Paper] [Project Page] [Web demos] [Video]

AMT is a lightweight, fast, and accurate algorithm for Frame Interpolation. It aims to provide practical solutions for video generation from a few given frames (at least two frames).

More examples can be found in our project page.

Web demos

Integrated into Hugging Face Spaces 🤗 using Gradio. Try out the Web Demo:

Try AMT to interpolate between two or more images at

Change Log

Apr 20, 2023: Our code is publicly available.

Method Overview

For technical details, please refer to the method.md file, or read the full report on arXiv.

Dependencies and Installation

Clone Repo

git clone https://github.com/MCG-NKU/AMT.git

Create Conda Environment and Install Dependencies

conda env create -f environment.yaml
conda activate amt

Download pretrained models for demos from Pretrained Models and place them to the pretrained folder

Quick Demo

Note that the selected pretrained model ([CKPT_PATH]) needs to match the config file ([CFG]).

Creating a video demo, increasing $n$ will slow down the motion in the video. (With $m$ input frames, [N_ITER] $=n$ corresponds to $2^n\times (m-1)+1$ output frames.)

python demos/demo_2x.py -c [CFG] -p [CKPT] -n [N_ITER] -i [INPUT] -o [OUT_PATH] -r [FRAME_RATE]
# e.g. [INPUT]
# -i could be a video / a regular expression / a folder contains multiple images
# -i demo.mp4 (video)/img_*.png (regular expression)/img0.png img1.png (images)/demo_input (folder)

# e.g. a simple usage
python demos/demo_2x.py -c cfgs/AMT-S.yaml -p pretrained/amt-s.pth -n 6 -i assets/quick_demo/img0.png assets/quick_demo/img1.png

Note: Please enable --save_images for saving the output images (Save speed will be slowed down if there are too many output images)
Input type supported: a video / a regular expression / multiple images / a folder containing input frames.
Results are in the [OUT_PATH] (default is results/2x) folder.

Pretrained Models

Dataset	🔗 Download Links	Config file	Trained on	Arbitrary/Fixed
AMT-S	[Google Driver][Baidu Cloud][Hugging Face]	[cfgs/AMT-S]	Vimeo90k	Fixed
AMT-L	[Google Driver][Baidu Cloud][Hugging Face]	[cfgs/AMT-L]	Vimeo90k	Fixed
AMT-G	[Google Driver][Baidu Cloud][Hugging Face]	[cfgs/AMT-G]	Vimeo90k	Fixed
AMT-S	[Google Driver][Baidu Cloud][Hugging Face]	[cfgs/AMT-S_gopro]	GoPro	Arbitrary

Training and Evaluation

Please refer to develop.md to learn how to benchmark the AMT and how to train a new AMT model from scratch.

Citation

If you find our repo useful for your research, please consider citing our paper:

@inproceedings{licvpr23amt,
   title={AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation},
   author={Li, Zhen and Zhu, Zuo-Liang and Han, Ling-Hao and Hou, Qibin and Guo, Chun-Le and Cheng, Ming-Ming},
   booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
   year={2023}
}

License

This code is licensed under the Creative Commons Attribution-NonCommercial 4.0 International for non-commercial use only. Please note that any commercial use of this code requires formal permission prior to use.

Contact

For technical questions, please contact zhenli1031[AT]gmail.com and nkuzhuzl[AT]gmail.com.

For commercial licensing, please contact cmm[AT]nankai.edu.cn

Acknowledgement

We thank Jia-Wen Xiao, Zheng-Peng Duan, Rui-Qi Wu, and Xin Jin for proof reading. We thank Zhewei Huang for his suggestions.

Here are some great resources we benefit from:

IFRNet and RIFE for data processing, benchmarking, and loss designs.
RAFT, M2M-VFI, and GMFlow for inspirations.
FILM for Web demo reference.

If you develop/use AMT in your projects, welcome to let us know. We will list your projects in this repository.

We also thank all of our contributors.

amt's People

Contributors

Stargazers

Watchers

Forkers

dawei03896 whuhxb baiyuetribe moileehyeji nefujoeychen jaedukseo c0rvus-ix jinwook-shim leoyouli cv-ip snow1307 lcenarthas xymfei peterzs jackyin68 men1scus rafa-zy

amt's Issues

error

what if n is not a integer

Thanks for your work!
I wonder if how to interpolate from 16fps to 24fps

Green screen only

with n = 1 and r = 8, I get just a video of this green screen. Any idea why?

torch.cuda.OutOfMemoryError: CUDA out of memory

thank you very much for your AMT !!!

when i run (python demos/demo_2x.py ...) , get error
torch.cuda.OutOfMemoryError: CUDA out of memory

Loading [images] from [['image\\panda_0.png', 'image\\panda_1.png']], the number of images = [2]
anchor_resolution 67108864
Loading [networks.AMT-G.Model] from [pretrained/amt-g.pth]...
Start frame interpolation:
Iter 1. input_frames=2 output_frames=3
Traceback (most recent call last):
  File "D:\Software\AI\AMT\2304\amt_1\demos\demo_2x.py", line 190, in <module>
    imgt_pred = model(in_0, in_1, embt, scale_factor=scale, eval=True)['imgt_pred']
  File "D:\Program\conda\envs\py39\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Software\AI\AMT\2304\amt_1\.\networks\AMT-G.py", line 86, in forward
    corr_fn = BidirCorrBlock(fmap0, fmap1, radius=self.radius, num_levels=self.corr_levels)
  File "D:\Software\AI\AMT\2304\amt_1\.\networks\blocks\raft.py", line 161, in __init__
    corr_T = F.avg_pool2d(corr_T, 2, stride=2)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.46 GiB (GPU 0; 8.00 GiB total capacity; 6.48 GiB already allocated; 0 bytes free; 6.50 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

The trainning time

Hello,

How long does it take to train the model?

Thank you

error

when I run ：python flow_generation/gen_flow.py -r data/vimeo_triplet
there is a error ：

There seems to be some bugs

The training time cost

Very nice work!

May I know how long it costs to train the proposed method on the Vimeo dataset?
For example, with two 3090 gpus.

Bests and many thanks

NameError: name 'anchor_resolution' is not defined

Getting this error when running last cell in colab:

#@title # 3. Enjoy the Smooth Video import PIL.Image if not hasattr(PIL.Image, 'Resampling'): # Pillow < 9.0 PIL.Image.Resampling = PIL.Image model_type_upper = model_type.upper() !python3 '/content/AMT/demos/demo_2x.py' \ --config cfgs/{model_type_upper}.yaml \ --ckpt pretrained/{model_type}.pth \ --out_path {outputs_dir} \ --frame_rate {output_video_fps} from mediapy import read_video, show_video video = read_video(f'{outputs_dir}/demo.avi') show_video(video, fps=output_video_fps)

I alo had to change the name of the python file to run, to demo_2x.py, the once originally written is I believe a typo.

Specifying the output FPS

I would like to know if the AMT algorithm interpolates the video by specifying the output FPS rather than by specifying the output -n value?

AttributeError: 'Encoder' object has no attribute 'register_module'

Your web demo results in an error

About FLOPs calculation

First of all, thank you for your excellent work!
While reading your paper, I noticed that you mentioned your FLOPs calculation is based on 720p frames (1280x720). I ran some tests on 720p frames using thop, and the results I obtained were slightly different from those in your paper. I was wondering if there was any mistake in my codes:

import sys
sys.path.append('.')
import torch
import argparse
from omegaconf import OmegaConf
from thop import profile
from thop import clever_format
from utils.build_utils import build_from_cfg

parser = argparse.ArgumentParser(
                prog = 'AMT',
                description = 'Speed&parameter benchmark',
                )
parser.add_argument('-c', '--config', default='cfgs/AMT-L.yaml')
parser.add_argument('--H', default=256, type=int)
parser.add_argument('--W', default=256, type=int)
args = parser.parse_args()

cfg_path = args.config
network_cfg = OmegaConf.load(cfg_path).network
network_name = network_cfg.name
model = build_from_cfg(network_cfg)
model = model.cuda()
model.eval()
img0_720p = torch.randn(1, 3, 720, 1280).cuda()
img1_720p = torch.randn(1, 3, 720, 1280).cuda()
embt = torch.tensor(1/2).float().view(1, 1, 1, 1).cuda()

flops, params = profile(model, inputs=(img0_720p, img1_720p, embt, True))
flops, params = clever_format([flops, params], "%.3f")
print('(THOP) 720p Flops: ', flops)
print('(THOP) 720p Params: ', params)

As a result, I got:
For AMT-S: 0.12T v.s. 0.12T in your paper
For AMT-L: 0.66T v.s. 0.58T in your paper
For AMT-G: 2.24T v.s. 2.07T in your paper
Could you please check my code for any possible mistakes? Also, do you have a FLOPs calculation code you could share? Thank you very much in advance!

The huggingface demo is down

Hello:
The huggingface demo is down. The error message is "Runtime error Memory limit exceeded (16Gi)".

By the way, I would be grateful if you could make AMT into an installable pip package.

Error in preparing the optical flow for re-training the model with Vimeo dataset, when run "python flow_generation/gen_flow.py -r data/vimeo_triplet"

Hi,
I encountered an issue when I want to prepare the optical flow data with Vimeo dataset.
Specifically, I downloaded the Vimeo dataset (vimeo_septuplet.zip) and unzipped (including rename) it as /mnt/data_nas/srwang/vimeo/vimeo_triplet/{readme.txt; sequences; tri_testlist.txt; tri_trainlist.txt}.
Following the illustration in https://github.com/MCG-NKU/AMT/blob/main/docs/develop.md to prepare the environment, I ran python flow_generation/gen_flow.py -r /mnt/data_nas/srwang/vimeo/vimeo_triplet. There is an error called "getopt.GetoptError: option -r not recognized".
The detailed snapshot for the bug is attached.
Thanks for your time and looking forward to your reply.

error

rm: cannot remove 'results': No such file or directory
Loading [images] from [['assets/quick_demo/a.JPG', 'assets/quick_demo/b.JPG']], the number of images = [2]
Traceback (most recent call last):
File "/content/AMT/demos/demo_2x.py", line 104, in
inputs = [img2tensor(read(img_path)).to(device) for img_path in input_path]
File "/content/AMT/demos/demo_2x.py", line 104, in
inputs = [img2tensor(read(img_path)).to(device) for img_path in input_path]
File "/content/AMT/./utils/utils.py", line 106, in read
else: raise Exception('don't know how to read %s' % file)
Exception: don't know how to read assets/quick_demo/a.JPG
1132 if not pathlib.Path(path).is_file():
-> 1133 raise RuntimeError(f"Video file '{path}' is not found.")
1134 command = [
1135 _get_ffmpeg_path(),

RuntimeError: Video file 'results/demo_0000.mp4' is not found.

the means of Avg. flow in fig.6 of paper

Thanks for your wonderful work, and i want to know the meaning of Avg. flow which is written in the caption of fig6. Moreover, the last decoder results flows with the channel number of 3 in AMT-S, but in the process of visualizing optical flow, the channel number are needed to be 2, how do you deal with that?

looking forward to your reply, thanks again

Question about arbitrary time model

Why not train an arbitrary time model on the Septuple set of Vimeo90K, which is larger than GOPRO?
I did some tests and the arbitrary time model trained on GOPRO does not perform well.

Cuda available but still not using cuda

btw this check of device==cuda isn't working - but if I change the line in bold, it works

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('using device =', device)
cfg_path = args.config
ckpt_path = args.ckpt
input_path = args.input
out_path = args.out_path
iters = int(args.niters)
frame_rate = int(args.frame_rate)
save_images = args.save_images
print('using device =', device)
**if device != 'cpu':**
    print('using cuda mode')
    anchor_resolution = 1024 * 512
    anchor_memory = 1500 * 1024**2
    anchor_memory_bias = 2500 * 1024**2
    vram_avail = torch.cuda.get_device_properties(device).total_memory
    print("VRAM available: {:.1f} MB".format(vram_avail / 1024 ** 2))
else:
    print('using cpu mode')
    # Do not resize in cpu mode
    anchor_resolution = 8192*8192
    anchor_memory = 1
    anchor_memory_bias = 0
    vram_avail = 1

mcg-nku / amt Goto Github PK

amt's Introduction

AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation

Web demos

Change Log

Method Overview

Dependencies and Installation

Quick Demo

Pretrained Models

Training and Evaluation

Citation

License

Contact

Acknowledgement

amt's People

Contributors

Stargazers

Watchers

Forkers

amt's Issues

Recommend Projects

Recommend Topics

Recommend Org