Code Monkey home page Code Monkey logo

fingerrec / self-supervised-temporal-discriminative-representation-learning-for-video-action-recognition Goto Github PK

View Code? Open in Web Editor NEW
77.0 6.0 5.0 6.97 MB

[Arxiv2020] The code for our paper 《Self-Supervised Temporal-Discriminative Representation Learning for Video Action Recognition》 https://arxiv.org/abs/2008.02129

Home Page: https://zhuanlan.zhihu.com/p/176774543

Python 95.29% Shell 4.71%
video-action-recognition self-supervised-learning unsupervised representation-learning

self-supervised-temporal-discriminative-representation-learning-for-video-action-recognition's Introduction

Self-Supervised Temporal-Discriminative Representation Learning

The source code for our paper

"Self-Supervised Temporal-Discriminative Representation Learning for Video Action Recognition" paper

Overview

Without one label available, our method learn to focus on motion region powerful!

example

Our self-supervised VTDL signifcantly outperforms existing self-supervised learning method in video action recognition, even achieve better result than fully-supervised methods on UCF101 and HMDB51 when a small-scale video dataset (with only thousands of videos) is used for pre-training!

sample_acc.png

Requirements

  • Python3
  • pytorch1.1+
  • PIL

Structure

  • datasets
    • list
      • hmdb51: the train/val lists of HMDB51
      • ucf101: the train/val lists of UCF101
      • kinetics-400: the train/val lists of kinetics-400
  • experiments
    • logs: experiments record in detials
    • TemporalDis
      • hmdb51
      • ucf101
      • kinetics
    • gradientes:
    • visualization
  • src
    • data: load data
    • loss: the loss evluate in this paper
    • model: network architectures
    • scripts: train/eval scripts
    • TC: detail implementation of Spatio-temporal consistency
    • utils
    • feature_extract.py
    • main.py
    • trainer.py
    • option.py

Dataset

Look dataset.md. Prepare dataset in txt file, and each row of txt is as below: The split of hmdb51/ucf101/kinetics-400 can be download from google driver.

Each item include

video_path class frames_num

VTDL

Network Architecture

The network is in the folder src/model/[backbone].py

Method #logits_channel
C3D 512
R2P1D 2048
I3D 1024
R3D 2048

Step1: self-supervised learning

HMDB51

bash scripts/TemporalDisc/hmdb51.sh

UCF101

bash scripts/TemporalDisc/ucf101.sh

Kinetics-400

bash scripts/TemporalDisc/kinetics.sh

Notice: More Training Options and ablation study Can be find in scripts

Step2: Transfer to action recognition

HMDB51

#!/usr/bin/env bash
python main.py \
--method ft \
--train_list ../datasets/lists/hmdb51/hmdb51_rgb_train_split_1.txt \
--val_list ../datasets/lists/hmdb51/hmdb51_rgb_val_split_1.txt \
--dataset hmdb51 \
--arch i3d \
--mode rgb \
--lr 0.001 \
--lr_steps 10 20 25 30 35 40 \
--epochs 45 \
--batch_size 4 \
--data_length 64 \
--workers 8 \
--dropout 0.5 \
--gpus 2 \
--logs_path ../experiments/logs/hmdb51_i3d_ft \
--print-freq 100 \
--weights ../experiments/TemporalDis/hmdb51/models/04-16-2328_aug_CJ/ckpt_epoch_48.pth

UCF101

#!/usr/bin/env bash
python main.py \
--method ft \
--train_list ../datasets/lists/ucf101/ucf101_rgb_train_split_1.txt \
--val_list ../datasets/lists/ucf101/ucf101_rgb_val_split_1.txt \
--dataset ucf101 \
--arch i3d \
--mode rgb \
--lr 0.0005 \
--lr_steps 10 20 25 30 35 40 \
--epochs 45 \
--batch_size 4 \
--data_length 64 \
--workers 8 \
--dropout 0.5 \
--gpus 2 \
--logs_path ../experiments/logs/ucf101_i3d_ft \
--print-freq 100 \
--weights ../experiments/TemporalDis/ucf101/models/04-18-2208_aug_CJ/ckpt_epoch_45.pth

Notice: More Training Options and ablation study Can be find in scripts

Results

Step2:Transfer

With same experiment setting, the result is reported below:

Method UCF101 HMDB51
Baseline 60.3 22.6
+ BA 63.3 26.2
+ Temporal Discriminative 72.7 41.2
+ TCA 82.3 52.9

trained models/logs/performance

We provided trained models/logs/performance in google driver.

Baseline + BA

BA_fine_tune_performance.png

performance;

trained_model;

logs

Baseline + BA + Temporal Discriminative

wo_TCA_fine_tune_performance.png

performance;

trained_model;

logs

Baseline + BA + Temporal Discriminative + TCA

(a). Pretrain

Loss curve:

loss.png

Ins Prob:

prob.png

pretrained_weight

This pretrained model can achieve 52.7% on HMDB51.

(b). Finetune

VTDL_fine_tune_performance.png

performance;

trained_model;

logs

The result is report with single video clip. In the test, we will average ten clips as final predictions. Will lead to around 2-3% improvement.

python test.py

Feature Extractor

As STCR can be easily extend to other video representation task, we offer the scripts to perform feature extract.

python feature_extractor.py

The feature will be saved as a single numpy file in the format [video_nums,features_dim]

Citation

Please cite our paper if you find this code useful for your research.

@Article{wang2020self,
  author  = {Jinpeng Wang and Yiqi Lin and Andy J. Ma and Pong C. Yuen},
  title   = {Self-supervised Temporal Discriminative Learning for Video Representation Learning},
  journal = {arXiv preprint arXiv:2008.02129},
  year    = {2020},
}

Others

The project is partly based on Unsupervised Embedding Learning and MOCO.

self-supervised-temporal-discriminative-representation-learning-for-video-action-recognition's People

Contributors

fingerrec avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

self-supervised-temporal-discriminative-representation-learning-for-video-action-recognition's Issues

c3d model

Hi!
Thanks for your code. But to train the model lacks C3d module. Can you tell me more details about the question.

dataset

Hello, how is the data list generated?

No ideal results

Hi,thanks for your good work! I'm very interested in your work and follow. I run the self-supervised learning to achieve the pretrained weight, and in the step 2 transfer to action recognition, i get 47, the best result on HMDB51,and in the test, the result gets less than 1% improvement. I also use your provided pretrained weight to transfer to action recognition, but do not get the ideal results. I checked my procedure and try to increase the epoch, but it seems no use. The only difference is that in the step 1, i changed the batchsize from 16 into 8. I do not know if i miss some information or steps, could you give me some advice or help?

关于self-supervised的epoch

Hi,我看代码里面设置了默认的epoch240,请问这个需要跑多长时间?有试过缩小epoch的效果吗

How to get the small scale of data

Hi, thank you for the nice work.

I am interested in the small-scale video dataset part. I think this is a subset of original dataset such as UCF101.
Is the data sampling part totally random? For example, in the figure, it plots the dot of results with 1000 samples. Therefore, does this 1000 samples are randomly sampled frorm UCF101 full dataset? Or maybe you just set that for each action, 10 videos are selected to form 1010 samples for training. I do not know the inflence of these two sampling methods, but the latter one is more balanced.

I would also appriciate it if you can provide the corresponding training list. Thank you.

What's the finetune feature?

Hi, thanks for making the code public. I am wondering what's the finetune feature? Which dataset it is finetuned on? Actually I am planning to use self-supervised pretrained feature (not finetune on anything), would you like to do me a favor to post the link again 1) the weights from self-supervised pretrained on Kinetics-400; 2) the weights pretrained from both kinetics-400 and ucf-101? Any feedback is appreciated! Looking forward it!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.