Code Monkey home page Code Monkey logo

amt's Introduction

AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation

This repository contains the official implementation of the following paper:

AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation
Zhen Li*, Zuo-Liang Zhu*, Ling-Hao Han, Qibin Hou, Chun-Le Guo, Ming-Ming Cheng
(* denotes equal contribution)
Nankai University
In CVPR 2023

[Paper] [Project Page] [Web demos] [Video]

AMT is a lightweight, fast, and accurate algorithm for Frame Interpolation. It aims to provide practical solutions for video generation from a few given frames (at least two frames).

Demo gif

Web demos

Integrated into Hugging Face Spaces 🤗 using Gradio. Try out the Web Demo: Hugging Face Spaces

Try AMT to interpolate between two or more images at PyTTI-Tools:FILM

Change Log

  • Apr 20, 2023: Our code is publicly available.

Method Overview

pipeline

For technical details, please refer to the method.md file, or read the full report on arXiv.

Dependencies and Installation

  1. Clone Repo

    git clone https://github.com/MCG-NKU/AMT.git
  2. Create Conda Environment and Install Dependencies

    conda env create -f environment.yaml
    conda activate amt
  3. Download pretrained models for demos from Pretrained Models and place them to the pretrained folder

Quick Demo

Note that the selected pretrained model ([CKPT_PATH]) needs to match the config file ([CFG]).

Creating a video demo, increasing $n$ will slow down the motion in the video. (With $m$ input frames, [N_ITER] $=n$ corresponds to $2^n\times (m-1)+1$ output frames.)

python demos/demo_2x.py -c [CFG] -p [CKPT] -n [N_ITER] -i [INPUT] -o [OUT_PATH] -r [FRAME_RATE]
# e.g. [INPUT]
# -i could be a video / a regular expression / a folder contains multiple images
# -i demo.mp4 (video)/img_*.png (regular expression)/img0.png img1.png (images)/demo_input (folder)

# e.g. a simple usage
python demos/demo_2x.py -c cfgs/AMT-S.yaml -p pretrained/amt-s.pth -n 6 -i assets/quick_demo/img0.png assets/quick_demo/img1.png
  • Note: Please enable --save_images for saving the output images (Save speed will be slowed down if there are too many output images)
  • Input type supported: a video / a regular expression / multiple images / a folder containing input frames.
  • Results are in the [OUT_PATH] (default is results/2x) folder.

Pretrained Models

Dataset 🔗 Download Links Config file Trained on Arbitrary/Fixed
AMT-S [Google Driver][Baidu Cloud][Hugging Face] [cfgs/AMT-S] Vimeo90k Fixed
AMT-L [Google Driver][Baidu Cloud][Hugging Face] [cfgs/AMT-L] Vimeo90k Fixed
AMT-G [Google Driver][Baidu Cloud][Hugging Face] [cfgs/AMT-G] Vimeo90k Fixed
AMT-S [Google Driver][Baidu Cloud][Hugging Face] [cfgs/AMT-S_gopro] GoPro Arbitrary

Training and Evaluation

Please refer to develop.md to learn how to benchmark the AMT and how to train a new AMT model from scratch.

Citation

If you find our repo useful for your research, please consider citing our paper:

@inproceedings{licvpr23amt,
   title={AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation},
   author={Li, Zhen and Zhu, Zuo-Liang and Han, Ling-Hao and Hou, Qibin and Guo, Chun-Le and Cheng, Ming-Ming},
   booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
   year={2023}
}

License

This code is licensed under the Creative Commons Attribution-NonCommercial 4.0 International for non-commercial use only. Please note that any commercial use of this code requires formal permission prior to use.

Contact

For technical questions, please contact zhenli1031[AT]gmail.com and nkuzhuzl[AT]gmail.com.

For commercial licensing, please contact cmm[AT]nankai.edu.cn

Acknowledgement

We thank Jia-Wen Xiao, Zheng-Peng Duan, Rui-Qi Wu, and Xin Jin for proof reading. We thank Zhewei Huang for his suggestions.

Here are some great resources we benefit from:

If you develop/use AMT in your projects, welcome to let us know. We will list your projects in this repository.

We also thank all of our contributors.

amt's People

Contributors

nefujoeychen avatar nk-cs-zzl avatar paper99 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

amt's Issues

Green screen only

image

with n = 1 and r = 8, I get just a video of this green screen. Any idea why?

torch.cuda.OutOfMemoryError: CUDA out of memory

thank you very much for your AMT !!!

when i run (python demos/demo_2x.py ...) , get error
torch.cuda.OutOfMemoryError: CUDA out of memory

Loading [images] from [['image\\panda_0.png', 'image\\panda_1.png']], the number of images = [2]
anchor_resolution 67108864
Loading [networks.AMT-G.Model] from [pretrained/amt-g.pth]...
Start frame interpolation:
Iter 1. input_frames=2 output_frames=3
Traceback (most recent call last):
  File "D:\Software\AI\AMT\2304\amt_1\demos\demo_2x.py", line 190, in <module>
    imgt_pred = model(in_0, in_1, embt, scale_factor=scale, eval=True)['imgt_pred']
  File "D:\Program\conda\envs\py39\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Software\AI\AMT\2304\amt_1\.\networks\AMT-G.py", line 86, in forward
    corr_fn = BidirCorrBlock(fmap0, fmap1, radius=self.radius, num_levels=self.corr_levels)
  File "D:\Software\AI\AMT\2304\amt_1\.\networks\blocks\raft.py", line 161, in __init__
    corr_T = F.avg_pool2d(corr_T, 2, stride=2)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.46 GiB (GPU 0; 8.00 GiB total capacity; 6.48 GiB already allocated; 0 bytes free; 6.50 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

error

when I run :python flow_generation/gen_flow.py -r data/vimeo_triplet
there is a error :
image

The training time cost

Very nice work!

May I know how long it costs to train the proposed method on the Vimeo dataset?
For example, with two 3090 gpus.

Bests and many thanks

NameError: name 'anchor_resolution' is not defined

Getting this error when running last cell in colab:

#@title # 3. Enjoy the Smooth Video import PIL.Image if not hasattr(PIL.Image, 'Resampling'): # Pillow < 9.0 PIL.Image.Resampling = PIL.Image model_type_upper = model_type.upper() !python3 '/content/AMT/demos/demo_2x.py' \ --config cfgs/{model_type_upper}.yaml \ --ckpt pretrained/{model_type}.pth \ --out_path {outputs_dir} \ --frame_rate {output_video_fps} from mediapy import read_video, show_video video = read_video(f'{outputs_dir}/demo.avi') show_video(video, fps=output_video_fps)

I alo had to change the name of the python file to run, to demo_2x.py, the once originally written is I believe a typo.

Specifying the output FPS

I would like to know if the AMT algorithm interpolates the video by specifying the output FPS rather than by specifying the output -n value?

About FLOPs calculation

First of all, thank you for your excellent work!
While reading your paper, I noticed that you mentioned your FLOPs calculation is based on 720p frames (1280x720). I ran some tests on 720p frames using thop, and the results I obtained were slightly different from those in your paper. I was wondering if there was any mistake in my codes:

import sys
sys.path.append('.')
import torch
import argparse
from omegaconf import OmegaConf
from thop import profile
from thop import clever_format
from utils.build_utils import build_from_cfg

parser = argparse.ArgumentParser(
                prog = 'AMT',
                description = 'Speed&parameter benchmark',
                )
parser.add_argument('-c', '--config', default='cfgs/AMT-L.yaml')
parser.add_argument('--H', default=256, type=int)
parser.add_argument('--W', default=256, type=int)
args = parser.parse_args()

cfg_path = args.config
network_cfg = OmegaConf.load(cfg_path).network
network_name = network_cfg.name
model = build_from_cfg(network_cfg)
model = model.cuda()
model.eval()
img0_720p = torch.randn(1, 3, 720, 1280).cuda()
img1_720p = torch.randn(1, 3, 720, 1280).cuda()
embt = torch.tensor(1/2).float().view(1, 1, 1, 1).cuda()

flops, params = profile(model, inputs=(img0_720p, img1_720p, embt, True))
flops, params = clever_format([flops, params], "%.3f")
print('(THOP) 720p Flops: ', flops)
print('(THOP) 720p Params: ', params)

As a result, I got:
For AMT-S: 0.12T v.s. 0.12T in your paper
For AMT-L: 0.66T v.s. 0.58T in your paper
For AMT-G: 2.24T v.s. 2.07T in your paper
Could you please check my code for any possible mistakes? Also, do you have a FLOPs calculation code you could share? Thank you very much in advance!

The huggingface demo is down

Hello:
The huggingface demo is down. The error message is "Runtime error Memory limit exceeded (16Gi)".

image

By the way, I would be grateful if you could make AMT into an installable pip package.

Error in preparing the optical flow for re-training the model with Vimeo dataset, when run "python flow_generation/gen_flow.py -r data/vimeo_triplet"

Hi,
I encountered an issue when I want to prepare the optical flow data with Vimeo dataset.
Specifically, I downloaded the Vimeo dataset (vimeo_septuplet.zip) and unzipped (including rename) it as /mnt/data_nas/srwang/vimeo/vimeo_triplet/{readme.txt; sequences; tri_testlist.txt; tri_trainlist.txt}.
Following the illustration in https://github.com/MCG-NKU/AMT/blob/main/docs/develop.md to prepare the environment, I ran python flow_generation/gen_flow.py -r /mnt/data_nas/srwang/vimeo/vimeo_triplet. There is an error called "getopt.GetoptError: option -r not recognized".
The detailed snapshot for the bug is attached.
Thanks for your time and looking forward to your reply.

image

error

rm: cannot remove 'results': No such file or directory
Loading [images] from [['assets/quick_demo/a.JPG', 'assets/quick_demo/b.JPG']], the number of images = [2]
Traceback (most recent call last):
File "/content/AMT/demos/demo_2x.py", line 104, in
inputs = [img2tensor(read(img_path)).to(device) for img_path in input_path]
File "/content/AMT/demos/demo_2x.py", line 104, in
inputs = [img2tensor(read(img_path)).to(device) for img_path in input_path]
File "/content/AMT/./utils/utils.py", line 106, in read
else: raise Exception('don't know how to read %s' % file)
Exception: don't know how to read assets/quick_demo/a.JPG
1132 if not pathlib.Path(path).is_file():
-> 1133 raise RuntimeError(f"Video file '{path}' is not found.")
1134 command = [
1135 _get_ffmpeg_path(),

RuntimeError: Video file 'results/demo_0000.mp4' is not found.

the means of Avg. flow in fig.6 of paper

Thanks for your wonderful work, and i want to know the meaning of Avg. flow which is written in the caption of fig6. Moreover, the last decoder results flows with the channel number of 3 in AMT-S, but in the process of visualizing optical flow, the channel number are needed to be 2, how do you deal with that?

looking forward to your reply, thanks again

Question about arbitrary time model

Why not train an arbitrary time model on the Septuple set of Vimeo90K, which is larger than GOPRO?
I did some tests and the arbitrary time model trained on GOPRO does not perform well.

Cuda available but still not using cuda

btw this check of device==cuda isn't working - but if I change the line in bold, it works

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('using device =', device)
cfg_path = args.config
ckpt_path = args.ckpt
input_path = args.input
out_path = args.out_path
iters = int(args.niters)
frame_rate = int(args.frame_rate)
save_images = args.save_images
print('using device =', device)
**if device != 'cpu':**
    print('using cuda mode')
    anchor_resolution = 1024 * 512
    anchor_memory = 1500 * 1024**2
    anchor_memory_bias = 2500 * 1024**2
    vram_avail = torch.cuda.get_device_properties(device).total_memory
    print("VRAM available: {:.1f} MB".format(vram_avail / 1024 ** 2))
else:
    print('using cpu mode')
    # Do not resize in cpu mode
    anchor_resolution = 8192*8192
    anchor_memory = 1
    anchor_memory_bias = 0
    vram_avail = 1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.