Code Monkey home page Code Monkey logo

flavr's Introduction

FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation

WACV 2023 (Best Paper Finalist)

Eg1 Eg2

[project page] [paper] [Project Video]

FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction. It uses a customized encoder decoder architecture with spatio-temporal convolutions and channel gating to capture and interpolate complex motion trajectories between frames to generate realistic high frame rate videos. This repository contains original source code.

Inference Times

FLAVR delivers a better trade-off between speed and accuracy compared to prior frame interpolation methods.

Method FPS on 512x512 Image (sec)
FLAVR 3.10
SuperSloMo 3.33
QVI 1.02
DAIN 0.77

Dependencies

We used the following to train and test the model.

  • Ubuntu 18.04
  • Python==3.7.4
  • numpy==1.19.2
  • PyTorch==1.5.0, torchvision==0.6.0, cudatoolkit==10.1

Model

Training model on Vimeo-90K septuplets

For training your own model on the Vimeo-90K dataset, use the following command. You can download the dataset from this link. The results reported in the paper are trained using 8GPUs.

python main.py --batch_size 32 --test_batch_size 32 --dataset vimeo90K_septuplet --loss 1*L1 --max_epoch 200 --lr 0.0002 --data_root <dataset_path> --n_outputs 1

Training on GoPro dataset is similar, change n_outputs to 7 for 8x interpolation.

Testing using trained model.

Trained Models.

You can download the pretrained FLAVR models from the following links.

Method Trained Model
2x Link
4x Link
8x Link

2x Interpolation

For testing a pretrained model on Vimeo-90K septuplet validation set, you can run the following command:

python test.py --dataset vimeo90K_septuplet --data_root <data_path> --load_from <saved_model> --n_outputs 1

8x Interpolation

For testing a multiframe interpolation model, use the same command as above with multiframe FLAVR model, with n_outputs changed accordingly.

Time Benchmarking

The testing script, in addition to computing PSNR and SSIM values, will also output the inference time and speed for interpolation.

Evaluation on Middleburry

To evaluate on the public benchmark of Middleburry, run the following.

python Middleburry_Test.py --data_root <data_path> --load_from <model_path> 

The interpolated images will be saved to the folder Middleburry in a format that can be readily uploaded to the leaderboard.

SloMo-Filter on custom video

You can use our trained models and apply the slomo filter on your own video (requires OpenCV 4.2.0). Use the following command. If you want to convert a 30FPS video to 240FPS video, simply use the command

python interpolate.py --input_video <input_video> --factor 8 --load_model <model_path>

by using our pretrained model for 8x interpolation. For converting a 30FPS video to 60FPS video, use a 2x model with factor 2.

Baseline Models

We also train models for many other previous works on our setting, and provide models for all these methods. Complete benchmarking scripts will also be released soon.

New [April 2024]: Due to a shocking reduction of google drive allowance by Google to UCSD, I lost access to pre-trained models from other methods listed below. I hope to re-train them and publish new links in the future but don't count on it. Sorry!

Method PSNR on Vimeo Trained Model
FLAVR 36.3 Model
AdaCoF 35.3 Model
QVI* 35.15 Model
DAIN 34.19 Model
SuperSloMo* 32.90 Model
  • SuperSloMo is implemented using code repository from here. Other baselines are implemented using the official codebases.
  • The numbers presented here for the baselines are slightly better than those reported in the paper.

Google Colab

A Colab notebook to try 2x slow-motion filtering on custom videos is available in the notebooks directory of this repo.

Model for Motion-Magnification

Unfortunately, we cannot provide the trained models for motion-magnification at this time. We are working towards making a model available soon.

Acknowledgement

The code is heavily borrowed from Facebook's official PyTorch video repository and CAIN.

Cite

If this code helps in your work, please consider citing us.

@article{kalluri2023flavr,
  title={FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation},
  author={Kalluri, Tarun and Pathak, Deepak and Chandraker, Manmohan and Tran, Du},
  booktitle={WACV},
  year={2023}
}

flavr's People

Contributors

around-star avatar n00mkrad avatar tarun-kalluri avatar tarun005 avatar virtualramblas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

flavr's Issues

about QVI model

Hi. Thx for your efforts on benchmarking existing models. I wonder which repo you are using for quadratic video inpainting (QVI) model? Could you please share the link?

about the training tricks

your work is impressive!Hello, I'd like to ask you a few questions.When I download your code to training, I put the batchsize into 6, change data volume to around 20000, also use vimeo, but why I trained 70 several epoch, the loss on the training set and test set, PSNR and ssim are not convergence,lr at this time has dropped to a low value, so I think that also does not have the resolution to training necessary, the last PSNR is also less than 20, I wonder why I training result is so poor, and the index of the paper you far, can you give some advice?
image

periodic pause of interpolated video

sprite.mp4
sprite_FLAVR_8x.mp4

Hi,

I am using pretrained 8x model to interpolate the demo sprite video as shown on the project homepage. But I find that it seems to "pause" per second. Do you know why? Thx!

Do not load entire raw video into RAM

Unlike other video interpolation implementations like RIFE (https://github.com/hzwer/Arxiv2020-RIFE), this code loads all frames from disk into RAM.

This makes it impossible to interpolate videos longer than a minute or two, unless you have insane amounts of RAM.

It would be great if it's possible to instead load frames on-the-fly using a buffer, like RIFE does, for example.

Windows version

Great work Tarun. Can you share an updated version where one can run FLAVR inferencing and training (using small number of images) on windows as well: e.g., using pycharm etc.

Vimeo90K triplet test dataset performance issue

Hi,

I am impressed with your new video frame interpolation paper.

When I tested, I got 32.59dB in vimeo90K triplet test set.

According to your Middleburry.py in dataset directory, I fixed VimeoSepTuplet class to VimeoTriplet class like below.

What is the problem in my fixed code?

I am wondering if I could get custom triplet interpolation code which takes two input frames and yields an intermediate frame.

    class VimeoTriplet(Dataset):
        def __init__(self, data_root):
            self.data_root = data_root
            self.image_root = os.path.join(self.data_root, 'sequences')
        
            test_fn = os.path.join(self.data_root, 'tri_testlist.txt')

            with open(test_fn, 'r') as txt:
                self.seq_list = [line.strip() for line in txt]
        
        def __getitem__(self, index):
            im1 = Image.open('%s/%s/im1.png'%(self.image_root,self.seq_list[index])).convert('RGB')
            gt = Image.open('%s/%s/im2.png'%(self.image_root,self.seq_list[index])).convert('RGB')
            im3 = Image.open('%s/%s/im3.png'%(self.image_root,self.seq_list[index])).convert('RGB')
        
            im1, gt, im3 = map(to_tensor, (im1,gt,im3))
        
            return [im1, im1, im3, im3], [gt]

        def __len__(self):
            return len(self.seq_list)

Question about Training Time

Hi, thank you for releasing the code.

Your paper writes We use 8 GPUs and a mini-batch of 32 to train each model, and training is completed in about 36 hours for 2× a on 2080ti.
But I found it will take at least 5 days on 8 v100 GPUs for 200-epoch training on Vimeo-90K.
Is there some problem ignored by me?

Blur output

Input -

cat_video.mp4

Output -

cat_video_8x.mp4

Hey @tarun005 , I used the 8x pretrained model on this video. The output seems blurry mostly at the edges. Can this be improved?

silent failure

I'm running this on ubuntu w/a 3090 on a video file (4k mkv if it somehow matters, seems to read fine) but it fails silently at

def video_transform(videoTensor , downscale=1):
and the process is simply killed with no error - nothing indicating not enough memory, etc.

Any thoughts on debugging?

Can't run FLAVR

At first I tried to use Flowframes, but since it gave out an error I tried following your instructions on github. When I tried to run python interpolate.py --input_video input.mp4 --factor 8 --load_model FLAVR8X.pth I got a very similar if not identical error message:

13.000209881905063 Traceback (most recent call last): File "interpolate.py", line 133, in <module> videoTensor , resizes = video_transform(videoTensor , args.downscale) File "interpolate.py", line 121, in video_transform videoTensor = transforms(videoTensor) File "C:\Users\frangamer1892roblox\MiniConda3\lib\site-packages\torchvision\transforms\transforms.py", line 60, in __call__ img = t(img) File "D:\FLAVR\dataset\transforms.py", line 333, in __call__ return to_tensor(clip) File "D:\FLAVR\dataset\transforms.py", line 107, in to_tensor return clip.float().permute(3, 0, 1, 2) / 255.0 RuntimeError: [enforce fail at ..\c10\core\CPUAllocator.cpp:79] data. DefaultCPUAllocator: not enough memory: you tried to allocate 25944883200 bytes.

I am not really use what to do now. This are my specs if they help in any way:

DxDiag.txt

Testing result on vimeo90k_septuplet

Hello, my friend! I tested the model with pretrained model 'FLAVR_4x.pth' (yours) and dataset 'vimeo90k_septuplet', and the result of psnr I had got was 28.376122. I don't konw why it occurs.

16x or higher factor trained model

Hi Tarun,

It is remarkable that you make inference speed for 16x or higher factor faster than Super SloMo while performing well. Will you publicate the trained model of 16x and higher and add support afterward?

Question on PSNR evaluation on 8x and 4x (Table. 2 and Table. 3)

Hi,

I have a question about the evaluation on the 8x and 4x cases for Table.2 and Table. 3 regarding the Adobe dataset in the paper. It seems 4x cases has much higher PSNR compared to 8x cases.

Let's say the 7 intermediate frames are denoted as t1, t2, t3, t4, t5, t6, t7. To my understanding the PSNR values are normally:
(t1 close to t7) > (t2 close to t6) > (t3 close to t5) > t4
At lease, this is what I have observed for DAIN, SuperSloMo and QVI. And it is expected that when the temporal distance to the input frame increases, the interpolated quality decreases (lower PSNR).

For 4x, you would only have t2, t4, t6, so the average PSNR values should be expected to be lower than 8x.

However, for 4x in Table.3 FLAVR is 5.62dB higher compare to 8x in Table.2. And other methods (DAIN, QVI and SuperSloMo) all experienced much higher PSNR. To my understanding 5.62dB is a huge increase.

The expected trend should be similar to Table.3 in BMBC paper: https://arxiv.org/pdf/2007.12622.pdf
where PSNR(2x) < PSNR(4x) < PSNR(8x).

I am wondering if there is anything I missed for the evaluation that causes my confusion?

Thanks

forward interpolation

Due to latency constraint, will Flavr use just past frames for multi-frame predictions?

Thanks,

error with there is no sep_trainlist.txt

Hi there,

When Im trying to run the test.py with the pretrained model that you privided, I got an error is that it show there is NOT a directory : xxxxxx/sep_trainlist.txt (my input data path). Can you tell me how to fix this ? Thank you

custom video,output video size changed

Hi,
When I used custom video test interpolate.py I found that the video size had changed.The input size is(960,540) and the output size is (960,536).This is where I find the problem
downscale = int(downscale * 8) resizes = 8 *( H // downscale), 8 *( W // downscale)
image
So I changed the code like this
resizes = (8 * H // downscale), (8 * W // downscale).
But it has an error.
image
How can I fix this?

Unable to write out results

Hey,

I've managed to get up and running with flavr, right up until the final stage. I'm using a directory with a png sequence in it, which successfully runs through the network. But when it comes to writing it out I simply get:

Writing to  in_2xmp4.mp4
in_2xmp4: No such file or directory
Traceback (most recent call last):
  File "interpolate.py", line 164, in <module>
    os.remove(output_video)
FileNotFoundError: [WinError 2] The system cannot find the file specified: 'in_2xmp4'

I'm reading a sequence of pngs from a directory, using is_folder which is great; is there a way to write out a sequence of pngs rather than a video?

The training batch size of 8x VFI on GoPro

Hi,
What is the batch size for training FLAVR for 8x VFI, I see in the paper that it is 32 with the frame size of 512x512. But I train on 8 GPUs(1080Ti whose memory is the same as 2080Ti), I got OOM error and only a batch size of 16 is ok. Besides that, neither random frame order reversal nor random horizontal flipping as augmentation strategies can be found in the GoPro.py. I wonder that can I reproduce the results with this code ?

about the vimeo dataset for training

To my best knowledge, other methods train their networks using vimeo_triplet which contains three consecutive frames instead of using vimeo_setuplet (used in video super-resolution). Why do you provide the "vimeo_setuplet" dataset for testing? Is it a fair comparison with the existing approaches?

How to cascade different speed models?

Hi Tarun,

from the #32 issue I know we can cascade different models to make more speed interpolation,such cascade(2x,8x) models to make 16x interpolation, but how to do the cascade? Is that I use 2x model to generate 2x slow sequences firstly,and then apply 8x model to the 2x slow sequences?

Train on low rate video (5-10fps)?

I find your model behave bad on low rate video (5-10fps), I wonder how to fix this? Maybe Train on low rate videos can help? Thanks a lot.

Finetune problem

Hello, Thanks for your brilliant work but I have a problem about the finetune. When I finetune your model on my own dataset, the finetuned model predicted twinkled videos and I output the predicted frame, I found that the predicted frame was darker than the adjacent frames. Then I tried train the model from the start using Unet34, but got the similar results that darker. And the PSNR and training loss were improving, but the inference results were worse.
Could you please explain to me a little?
It's the training details
python main.py --batch_size 8 --test_batch_size 8 --dataset vimeo90K_septuplet --loss 1L1 -max_epoch 200 --lr 0.00001 --n_outputs 1
Namespace(batch_size=8, beta1=0.9, beta2=0.99, checkpoint_dir='.', cuda=True, data_root='/vimeo_septuplet', dataset='vimeo90K_septuplet', exp_name='exp'
, joinType='concat', load_from=None, log_iter=60, loss='1
L1', lr=1e-05, max_epoch=200, model='unet_18', n_outputs=1, nbr_frame=4, nbr_width=1, num_gpu=1, num_workers=16, pretrained='FLA
VR_2x.pth', random_seed=12345, resume=False, resume_exp=None, start_epoch=0, test_batch_size=8, upmode='transpose', use_tensorboard=False, val_freq=1

Attention weight score

Hi, I was reading your work and was wondering, how do you obtain the highest attention weight for the feature maps in figure 5? Do you just sum up the tensor along the channel dimension and sort that or do you use some other method?
Thanks!

Models trained with Huber and VGG loss

Hi Tarun,

Your work is really interesting!
I was wondering if you could share the models trained in the ablation experiments.

I am curious to see how the different loss functions affect downstream tasks like actuon recognition. From the paper, you mentioned that L1 loss results in sharper images. But, does this also translate to better action recognition results?

Please do share any experiments/insights regarding this. I would love to hear your thoughts.

UCF101 testing dataset

Hi, I found UCF101 original dataset with avi format and UCF101 triplet dataset with png format. But there is no 5-frames dataset availble. Can you provide the method to generate the UCF101 testing dataset for FLAVR.

Cannot reproduce AdaCoF results

Hi,

I use the AdaCoF official codes and the weight that you provide, only achieving 34.93dB on the Vimeo dataset, which is much lower than 35.40. Could you please share the script with me? Many thanks!

UCF101

Hi there,

I was just wondering how you created the UCF101 dataset for your experiments? The only version of the dataset I can find is either still in avi form or has only 2 frames. For a fair comparison, I would like to use the same dataset as you, is this available somewhere?

About test data of Adobe

Hi, thanks for your code, I tested the pretained model of 8x on the test set of gopro and all adobe dataset, and got psnr 31.31 and 31.83 respectively. It is the same as the results reported in paper about gopro, but not adobe(32.20 in paper).
I want to know if the results in paper is not tested on the whole adobe dataset, and could you provide more information about the experiment of 8x interpolation on adobe?
Thanks!

Training issue

Hi, I've been trying to train this network on an A100 GPU. However, as torch 1.5.0 doesn't support this GPU I am forced to use torch 1.9.0. The training is broken for torch versions>1.5.0 but cannot find the reason why. I have looked at the differences between the torch versions, however, nothing is clear as to why this happens. Basically, the model stays stuck at around 20dB for the duration of training. I previously tested this code on a 1080Ti with torch 1.5.0 and that worked fine. But due to memory constraints and training time, the A100 would be the better option.
Do you have any idea why this occurs and any possible solutions?

Thanks

training issue about PSNR

Hi, tarun ,excellent work on video interpolation!
I tried run your code , but I have some trouble. I set my config as
batch_size=2, beta1=0.9, beta2=0.99, checkpoint_dir='.', cuda=True, data_root='/home/Changchen/dataset./vimeo_septuplet', dataset='vimeo90K', exp_name='exp', joinType='concat', load_from=None, log_iter=60, loss='1*L1', lr=0.0002, max_epoch=50, model='unet_18', n_outputs=1, nbr_frame=4, nbr_width=1, num_gpu=1, num_workers=16, pretrained=None, random_seed=12345, resume=False, resume_exp=None, start_epoch=0, test_batch_size=1, upmode='transpose', use_tensorboard=False, val_freq=1
At the beginning, psnr was normal about 20,but it has gradually decreased to about 14. I wonder why it seems to be misconvergence.
Thank you for any help!

Training issue

Hi, author, thank you for sharing the code on GitHub. The code performed well in test, but the PSNR value was always maintained at about 17dB during training. What is the reason?

Unreliable FPS readout causes error

When I try to interpolate a video, this error pops up:

File "interpolate.py", line 120, in <module>
    videoTensor = video_to_tensor(input_video)
  File "interpolate.py", line 101, in video_to_tensor
    fps = md["video_fps"]
KeyError: 'video_fps'

I suspect it fails to read the frame rate for some reason.

This is one of the reasons I am asking for a manual input: #4

parameters

Hello!, Since I have not trained and tested the network, I would like to know what the parameters of this model are? Thanks

Questions about the inference time

Hi, thanks for your interesting work!
I tested the inference time on vimeo90K_septuplet using your script, and i got the time is 0.004 s. It seems too fast?
I modified the code and tested again, and the time I got is 0.195 s.
image
image
So, I wonder how the time in your paper was tested?

config.py not affecting interpolate.py behaviour

Hey team ! Congrats for the amazing job

when trying to modify the config file (eg : reducing batch size), it seems to not being affecting the behaviour of the script (interpolate.py) when launched.

Pretty new to learning python so I maybe misundeerstanding something here, but just in case !

best

Lucien

Work with custom videos

So I wanted to run this on a custom set of videos, but I'm unsure on how I'm supposed to setup the data properly to do so. I currently have another folder inside with my videos, and if I run:
python test.py --dataset vimeo90K_septuplet --data_root "vimeo90K_septuplet" --load_from "./FLAVR_2x.pth" --n_outputs 1
where the vimeo90k_septuplet folder just contains my two videos, I get the error
FileNotFoundError: [Errno 2] No such file or directory: 'vimeo90K_septuplet\\sep_trainlist.txt'

I'm unsure on how to setup the text file for this, and potentially more issues for setup, but I'm not quite sure where to go. Any help would be really appreciated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.