Code Monkey home page Code Monkey logo

corrflow's Introduction

Self-supervised Learning for Video Correspondence Flow

This repository contains the code (in PyTorch) for the model introduced in the following paper

Self-supervised Learning for Video Correspondence Flow (BMVC Oral representation)

by Zihang Lai, Weidi Xie

Figure

Citation

@inproceedings{Lai19,
  title={Self-supervised Learning for Video Correspondence Flow},
  author={Lai, Z. and Xie, W.},
  booktitle={BMVC},
  year={2019}
}

Contents

  1. Introduction
  2. Usage
  3. Results
  4. Contacts

Introduction

The objective of this paper is self-supervised learning of feature embeddings from videos, suitable for correspondence flow, i.e. matching correspondences between frames over the video. We leverage the natural spatial-temporal coherence of appearance in videos, to create a pointer model that learns to reconstruct a target frame by copying colors from a reference frame.

We make three contributions: First, we introduce a simple information bottleneck that enforces the model to learn robust features for correspondence matching, and avoids it learning trivial solutions, e.g. matching based on low-level color information. Second, we propose to train the model over a long temporal window in videos. To make the model more robust to complex object deformation, occlusion, the problem of tracker drifting, we formulate a recursive model, trained with scheduled sampling and cycle consistency. Third, we evaluate the approach by first training on the Kinetics dataset using self-supervised learning, and then directly applied for DAVIS video segmentation and JHMDB keypoint tracking. On both tasks, our approach has achieved state-of-the-art performance, especially on segmentation, we outperform all previous methods by a significant margin.

Usage

  1. Install dependencies
  2. Download Kinetics datasets. The videos should be decoded in files organized in the following way.
    ROOT_FOLDER/
    
        class_label_1/
            video_label_1/
                image_00001.jpg
                image_00002.jpg
                ...
            video_label_2/
           
        class_label_2/
            video_label_1/
            ...
    
        class_label_3/
            video_label_1/
            ...
        ...
        
    
    We use a csv file (functional/feeder/dataset/filelist.csv) for easier image indexing. Available Youtube videos in the Kinetics dataset may have changed at this time. So you may need to generate your own index file. It is easy to do so, just loop over all classes and all videos, record their relative path and the cooresponding frame number.
  3. Download DAVIS-2017 dataset. There is no need of pre-processing.

Dependencies

Train

  • Use the following command to train on Kinetics dataset

    python main.py --datapath path-to-kinetics --savepath log-path
    
  • Recently, we also find the OxUvA long-term tracking dataset yields comparable results. If you find the Kinetics dataset too large or not accessible, you can start with the OxUvA dataset, which is significantly smaller. Train our model on OxUvA with the following command

    python main_oxuva.py --datapath path-to-oxuva --savepath log-path
    
  • To use OxUvA dataset, you just need to extract all sequences of frames (both training and validation) into the same folder. There should be 337 sequences in total.

    ROOT_FOLDER/
    
        vid0000/
            000000.jpeg
            000001.jpeg
            ...
           
        vid0001/
        ...
        vid0336/
        
    

Test and evaluation

  • Use the following command to test our model on DAVIS-2017 validation dataset for preliminary result (Note: this code may produce slightly different results from official DAVIS benchmark evaluation code)

    python test.py --resume path-to-checkpoint \
                   --datapath path-to-davis \
                   --savepath log-path
    
  • Use the following command to generate output for official DAVIS testing code

    python benchmark.py --resume path-to-checkpoint \
                   --datapath path-to-davis \
                   --savepath log-path
    
  • Then you can test the output with the official Python evaluation code.

    python evaluation_method.py \
                    --task semi-supervised \
                    --results_path log-path
    

Pretrained model (No need to extract the file)

Trained on Kinetics Trained on OxUvA (updated)
Google drive Google drive

Todo

  • Release JHMDB testing code
  • Release larger models with higher accuracy

Results

DAVIS-2017 Results DAVIS-2017 Results

Preliminary results on OxUvA

Dataset J&F (mean) J (Mean) J (Recall) F (Mean) F (Recall)
OxUvA 50.3 48.4 53.2 52.2 56.0

corrflow's People

Contributors

zlai0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

corrflow's Issues

Missing oxuva.csv file

The .csv file for the OxUvA dataset is not present in the datas folder with the kinetics.csv folder.

Color quantization

The OxUvA dataset also loads centroids from centroids_16k_kinetics_10000samples.npy for quantizing the color space. Shouldn't the code compute new centroids for a different dataset? Can we use the same centroids for a custom dataset that we plan to use with the code?

Full Code

Hi,

When will the full code be released?

The pretrained models are not valid tar files

tar -xvf kinetics.tar tar: This does not look like a tar archive tar: Skipping to next header tar: Exiting with failure status due to previous errors
The same is true for the oxuva.tar file also.

OxUvA dataset training details

Are the training hyper-parameters same for OxUvA as well? 1 million iterations, with a batch size of 8? If not, what values do you use for 'epochs' (not iterations), 'lr' and 'bsize' in main_oxuva.py?

Saved models

Is the kinetics.tar file actually a tar file? It doesn't open with tar -xvf kinetics.tar, nor by assuming it's a .tar.gz and trying similar.

Batch size

Hi, I tried to reproduce your experients on OxUvA dataset. I found the default batch size is 6 which is too large for a single 2080ti. Therefore, I used 2 GPUs and the J&F-Mean result was 40.5237, lower than the paper result(50.3).
Is it related to the batch size? Could you pls tell me what's your training batchsize on each GPU and how many GPUs you used.

Test the output of semi-supervised segmentation

In the readme, we can find this:

Then you can test the output with the official Python evaluation code.

python evaluation_method.py --task semi-supervised --results_path log-path

What is evaluation_method.py and where can I find it? Let's say I want to visualize the output on the OxUvA dataset.

Pytorch Correlation module on Windows

Did anybody successfully installed the Pytorch Correlation module on Windows?
I get the error

  1 error detected in the compilation of "C:/Users/root/AppData/Local/Temp/tmpxft_0000bb6c_00000000-10_correlation_cuda_kernel.cpp1.ii".
  correlation_cuda_kernel.cu
  error: command 'C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v10.0\\bin\\nvcc.exe' failed with exit status 1

caused by

C:/Logiciels/Anaconda3/envs/torch/lib/site-packages/torch/include\torch/csrc/jit/argument_spec.h(161): error: member "torch::jit::ArgumentSpecCreator::DEPTH_LIMIT" may not be initialized

I found the same issue here pytorch/extension-cpp#37, but it is a month old with no answer...

How is video segmentation done exactly?

Hello. I really liked your paper but I am a bit confused about how you do video segmentation. Do you encode a sequence of RGB images with the segmentation mask already applied to it? And then, you predict the next sequence of RGB images with the segmentation masks applied to them?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.