v-sense / action-net Goto Github PK

View Code? Open in Web Editor NEW

196.0 11.0 45.0 1.63 MB

Official PyTorch implementation of ACTION-Net: Multipath Excitation for Action Recognition (CVPR'21)

License: MIT License

Python 99.06% Shell 0.68% Dockerfile 0.26%

action-net's Introduction

ACTION-Net

Official implementation of ACTION-Net: Multipath Excitation for Action Recognition (CVPR'21)

By Zhengwei Wang, Qi She and Aljosa Smolic

Getting Started

EgoGesture data folder structure

|-frames
|---Subject01
|------Scene1
|---------Color1
|------------rgb1
|---------------000001.jpg
......
|-labels
|---Subject01
|------Scene1
|---------Group1.csv
......

Something-Something V2

|-frames
|---1
|------000001.jpg
|------000002.jpg
|------000003.jpg
......

Jester

|-frames
|---1
|------000001.jpg
|------000002.jpg
|------000003.jpg
......

Requirements

Provided in the action.Dockerfile

Annotation files

Annotation files are at this link. Please follow the annotation files to construct the frame path.

Usage

sh train_ego_8f.sh 0,1,2,3 if you use four gpus

Acknowledgment

Our codes are built based on previous repos TSN, TSM and TEA

Pretrained models

Currently, we do not provide the pretrained models since we reconstruct the structure and rename our modules of ACTION for public release. It should be able to get the similar performance indicated in the paper using the codes provided above.

(Update)

EgoGesture using 8f

Jester using 8f

Citation

If you find our work useful in your research, please cite:

@InProceedings{Wang_2021_CVPR,
author = {Wang, Zhengwei and She, Qi and Smolic, Aljosa},
title = {ACTION-Net: Multipath Excitation for Action Recognition},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2021}
}

action-net's People

Contributors

Stargazers

Watchers

Forkers

1800300814 desertfex dmuqlzhang baodijun wolfworld6 whjzsy liuwenhaha mjt1312 chengmuni66 flaber123 xrosliang user8361 3sunny pjwang317 gesturepose outbreak-hui sheqi btlgrm pugangqiang holmes-gu zebrajack mikuugithub zooham xacheng1996 xiehewei trendingtechnology cong-wu orange10010 1suancaiyu the-beacon zhangxinhaokb dengfenglai321 boywholived12 ospaceman yaolin759 masteryi-lab yikoudamifan kzx2018 tony2016edu haotian-qi h-hui2277 hucui2022 learning-art lanyan0611 robotseye

action-net's Issues

可视化代码

您好，我也需要这个代码可视化代码，希望您能分享给我，谢谢您！我的信箱是: [email protected]

How to set the learning rate ?

Hello, I would like to know what is the basis for your fine-tuning of the learning rate ?
The experiment ?
Each adjustment of a parameter, an experiment is performed ？

return [
            {'params': first_conv_weight, 'lr_mult': 1, 'decay_mult': 1, 'name': "first_conv_weight"},
            {'params': first_conv_bias, 'lr_mult': 2, 'decay_mult': 0, 'name': "first_conv_bias"},
            {'params': normal_weight, 'lr_mult': 1, 'decay_mult': 1, 'name': "normal_weight"},
            {'params': normal_bias, 'lr_mult': 2, 'decay_mult': 0, 'name': "normal_bias"},
            {'params': bn, 'lr_mult': 1, 'decay_mult': 0, 'name': "BN scale/shift"},
            {'params': custom_weight, 'lr_mult': 1, 'decay_mult': 1, 'name': "custom_weight"},
            {'params': custom_bn, 'lr_mult': 1, 'decay_mult': 0, 'name': "custom_bn"},
            # for fc
            {'params': lr5_weight, 'lr_mult': 5, 'decay_mult': 1, 'name': "lr5_weight"},
            {'params': lr10_bias, 'lr_mult': 10, 'decay_mult': 0, 'name': "lr10_bias"},
        ]

About online inference mode

Can Action-Net run as an online inference version(frame by frame input) like TSM(https://github.com/mit-han-lab/temporal-shift-module/tree/master/online_demo)?

model.py file function import error

In the models.py file under models line 132 'from models.temporal_shift_res2net import make_temporal_shift' and line 137 'from ops.non_local import make_non_local' error. Where should I import from

About the use of TSM in your work.

Thank you for your work. After I checked the source code, I have some questions. In model.py, I see that you use the shift operation of TSM model, but it is not explained in the paper. Is the test result obtained by adding the ACTION　module you proposed to the TSM model?
Thanks for your reply.

Problems with PKL file generation

Hi, I'm a newbie. I would like to ask what is the code on the Jester dataset, something-something V2 dataset and generating the training, testing, and validating PKL files?

How to resume training from checkpoint?

Hi, thank you for your great work.
I tried to reproduce your work, but the machine was shut down.
So, I tried to resume the training process, but the epoch started at 0.
Is there any way to resume my training process from the checkpoint? (the epoch should start at the previous state)

I want to ask some questions about the code. I hope you can help me solve them， thx

code path is models/action.py, line63-72.
line 63 I could konw x.size is nt, c, h, w. After x forward from line 63 to line 72, could get x_shift and thex_shift.size is nt, c,h,w.
The x_shift.size is euqal to x.size. why don't make x_shift = x?
I want to konw what function of these code.

Question of the paper

Hello! I want to ask a question about the paper.

Do you directly use the result of C3D:Resnext101 in their paper or train the model by yourself?

Thanks!

cam visualization

Hi, I am a beginner, can you tell me how to cam visualization?

hyperparameters of 16 frames training.

Hello, I follow your suggestion and this is my hyperparameters of 16 frames training.

python3 train.py --is_train --is_shift \
--dataset EgoGesture --clip_len 16 \
--shift_div 8 --wd 1e-5 --dropout 0 \
--cuda_id $cuda_id --batch_size 16 --lr_steps 10 15 20 \
--lr 25e-4 --base_model resnet50 --epochs 25

The val accuracy is quiet lower than 8 frames result,which is 91.7 and 8 frames is 94.4.
I really want to reproduce the result of 16 frames in Egogesture. Is there any hyperparameter I need to change?

The result of CAM

Hi! Thank you for your wonderful project! Could you release the code for the CAM picture? I want to learn more about visualization.

Can you provide the code for generating pkl-file?

Thank you very much for providing the source code of this project, I want to use this project to experiment on the Something-Something V1, but the annotation file of the Something-Something V1 is in csv format, I am having difficulty with how to generate the pkl-file. Can you provide the code to generate the pkl-file or the pkl-file about the Something-Something V1. Thanks.

ON ITS OWN DATASET, ACTIONNET IS NOT AS EFFECTIVE AS STM

Thank you for the code, my dataset video is 2s, what do I need to pay attention to? TO MAKE ACTIONNET WORK BETTER.

Multi gpu training error

Could you fix the multi gpu training error
I think in the action.py have some issue

test accuracy

Hi, I have downloaded your jester-resnet pretrained model and jester annotations. I find the lables in test.pkl are all 0. When I run the test_jester_8f.sh, I get top1 =0, top5 = 100 witch mode in test.py is 'test' and use the jester_annotation test.pkl.

pre-training model

Do you have any pre-training models or check-point?
Thanks

About Visualization

Hello, after reading your code and paper, I am very interested in the visualization part of your model(Figure 1.). Your input is five-dimensional data which include time dimension. How do you call the CAM module? Or could open source the code about CAM you have adjusted, thanks a lot.

Question about TSM result

In Jester dataset, TSM result in Table 1 of your paper is 94.4 (ResNet50 8frame 3crop 10clip),
But TSM result in origin paper(Table 1) is 97.0 (ResNet50 8frame full-resolution 2clips ).
I test the TSM by myself from the TSM official public code and it can also reach 97.0.
How cause this Gap between 94.4 and 97.0?