mit-han-lab / temporal-shift-module Goto Github PK

View Code? Open in Web Editor NEW

2.0K 42.0 417.0 244 KB

[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding

Home Page: https://arxiv.org/abs/1811.08383

License: MIT License

Python 89.57% Shell 1.63% Makefile 0.43% C++ 8.38%

acceleration low-latency temporal-modeling video-understanding efficient-model nvidia-jetson-nano tsm

temporal-shift-module's Introduction

TSM: Temporal Shift Module for Efficient Video Understanding [Website] [arXiv][Demo]

@inproceedings{lin2019tsm,
  title={TSM: Temporal Shift Module for Efficient Video Understanding},
  author={Lin, Ji and Gan, Chuang and Han, Song},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  year={2019}
}

[NEW!] We update the environment setup for the online_demo, and should be much easier to set up. Check the folder for a try!

[NEW!] We have released the pre-trained optical flow model on Kinetics. We believe the pre-trained weight will help the training of two-stream models on other datasets.

[NEW!] We have released the code of online hand gesture recognition on NVIDIA Jeston Nano. It can achieve real-time recognition at only 8 watts. See online_demo folder for the details. [Full Video]

Overview

We release the PyTorch code of the Temporal Shift Module.

Content

Prerequisites
Data Preparation
Code
Pretrained Models
- Kinetics-400
  - Dense Sample
  - Unifrom Sampling
- Something-Something
  - Something-Something-V1
  - Something-Something-V2
Testing
Training
Live Demo on NVIDIA Jetson Nano

Prerequisites

The code is built with following libraries:

For video data pre-processing, you may need ffmpeg.

Data Preparation

We need to first extract videos into frames for fast reading. Please refer to TSN repo for the detailed guide of data pre-processing.

We have successfully trained on Kinetics, UCF101, HMDB51, Something-Something-V1 and V2, Jester datasets with this codebase. Basically, the processing of video data can be summarized into 3 steps:

Extract frames from videos (refer to tools/vid2img_kinetics.py for Kinetics example and tools/vid2img_sthv2.py for Something-Something-V2 example)
Generate annotations needed for dataloader (refer to tools/gen_label_kinetics.py for Kinetics example, tools/gen_label_sthv1.py for Something-Something-V1 example, and tools/gen_label_sthv2.py for Something-Something-V2 example)
Add the information to ops/dataset_configs.py

Code

This code is based on the TSN codebase. The core code to implement the Temporal Shift Module is ops/temporal_shift.py. It is a plug-and-play module to enable temporal reasoning, at the cost of zero parameters and zero FLOPs.

Here we provide a naive implementation of TSM. It can be implemented with just several lines of code:

# shape of x: [N, T, C, H, W] 
out = torch.zeros_like(x)
fold = c // fold_div
out[:, :-1, :fold] = x[:, 1:, :fold]  # shift left
out[:, 1:, fold: 2 * fold] = x[:, :-1, fold: 2 * fold]  # shift right
out[:, :, 2 * fold:] = x[:, :, 2 * fold:]  # not shift
return out

Note that the naive implementation involves large data copying and increases memory consumption during training. It is suggested to use the in-place version of TSM to improve speed (see ops/temporal_shift.py Line 12 for the details.)

Pretrained Models

Training video models is computationally expensive. Here we provide some of the pretrained models. The accuracy might vary a little bit compared to the paper, since we re-train some of the models.

Kinetics-400

Dense Sample

In the latest version of our paper, we reported the results of TSM trained and tested with I3D dense sampling (Table 1&4, 8-frame and 16-frame), using the same training and testing hyper-parameters as in Non-local Neural Networks paper to directly compare with I3D.

We compare the I3D performance reported in Non-local paper:

method	n-frame	Kinetics Acc.
I3D-ResNet50	32 * 10clips	73.3%
TSM-ResNet50	8 * 10clips	74.1%
I3D-ResNet50 NL	32 * 10clips	74.9%
TSM-ResNet50 NL	8 * 10clips	75.6%

TSM outperforms I3D under the same dense sampling protocol. NL TSM model also achieves better performance than NL I3D model. Non-local module itself improves the accuracy by 1.5%.

Here is a list of pre-trained models that we provide (see Table 3 of the paper). The accuracy is tested using full resolution setting following here. The list is keeping updating.

model	n-frame	Kinetics Acc.	checkpoint	test log
TSN ResNet50 (2D)	8 * 10clips	70.6%	link	link
TSM ResNet50	8 * 10clips	74.1%	link	link
TSM ResNet50 NL	8 * 10clips	75.6%	link	link
TSM ResNext101	8 * 10clips	76.3%	TODO	TODO
TSM MobileNetV2	8 * 10clips	69.5%	link	link

Uniform Sampling

We also provide the checkpoints of TSN and TSM models using uniform sampled frames as in Temporal Segment Networks paper, which is more sample efficient and very useful for fine-tuning on other datasets. Our TSM module improves consistently over the TSN baseline.

model	n-frame	acc (1-crop)	acc (10-crop)	checkpoint	test log
TSN ResNet50 (2D)	8 * 1clip	68.8%	69.9%	link	link
TSM ResNet50	8 * 1clip	71.2%	72.8%	link	link
TSM ResNet50	16 * 1clip	72.6%	73.7%	link	-

Optical Flow

We provide the optical flow model pre-trained on Kinetics. The model is trained using uniform sampling. We did not carefully tune the training hyper-parameters. Therefore, the model is intended for transfer learning on other datasets but not for performance evaluation.

model	n-frame	top-1 acc	top-5 acc	checkpoint	test log
TSM ResNet50	8 * 1clip	55.7%	79.5%	link	-

Something-Something

Something-Something V1&V2 datasets are highly temporal-related. TSM achieves state-of-the-art performnace on the datasets: TSM achieves the first place on V1 (50.72% test acc.) and second place on V2 (66.55% test acc.), using just ResNet-50 backbone (as of 09/28/2019).

Here we provide some of the models on the dataset. The accuracy is tested using both efficient setting (center crop * 1clip) and accuate setting (full resolution * 2clip)

Something-Something-V1

model	n-frame	acc (center crop * 1clip)	acc (full res * 2clip)	checkpoint	test log
TSM ResNet50	8	45.6	47.2	link	link1 link2
TSM ResNet50	16	47.2	48.4	link	link1 link2
TSM ResNet101	8	46.9	48.7	link	link1 link2

Something-Something-V2

On V2 dataset, the accuracy is reported under the accurate setting (full resolution * 2clip).

model	n-frame	accuracy	checkpoint	test log
TSM ResNet50	8 * 2clip	61.2	link	link
TSM ResNet50	16 * 2lip	63.1	link	link
TSM ResNet101	8 * 2clip	63.3	link	link

Testing

For example, to test the downloaded pretrained models on Kinetics, you can run scripts/test_tsm_kinetics_rgb_8f.sh. The scripts will test both TSN and TSM on 8-frame setting by running:

# test TSN
python test_models.py kinetics \
    --weights=pretrained/TSM_kinetics_RGB_resnet50_avg_segment5_e50.pth \
    --test_segments=8 --test_crops=1 \
    --batch_size=64

# test TSM
python test_models.py kinetics \
    --weights=pretrained/TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e50.pth \
    --test_segments=8 --test_crops=1 \
    --batch_size=64

Change to --test_crops=10 for 10-crop evaluation. With the above scripts, you should get around 68.8% and 71.2% results respectively.

To get the Kinetics performance of our dense sampling model under Non-local protocol, run:

# test TSN using non-local testing protocol
python test_models.py kinetics \
    --weights=pretrained/TSM_kinetics_RGB_resnet50_avg_segment5_e50.pth \
    --test_segments=8 --test_crops=3 \
    --batch_size=8 --dense_sample --full_res

# test TSM using non-local testing protocol
python test_models.py kinetics \
    --weights=pretrained/TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e100_dense.pth \
    --test_segments=8 --test_crops=3 \
    --batch_size=8 --dense_sample --full_res

# test NL TSM using non-local testing protocol
python test_models.py kinetics \
    --weights=pretrained/TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e100_dense_nl.pth \
    --test_segments=8 --test_crops=3 \
    --batch_size=8 --dense_sample --full_res

You should get around 70.6%, 74.1%, 75.6% top-1 accuracy, as shown in Table 1.

For the efficient (center crop and 1 clip) and accurate setting (full resolution and 2 clip) on Something-Something, you can try something like this:

# efficient setting: center crop and 1 clip
python test_models.py something \
    --weights=pretrained/TSM_something_RGB_resnet50_shift8_blockres_avg_segment8_e45.pth \
    --test_segments=8 --batch_size=72 -j 24 --test_crops=1

# accurate setting: full resolution and 2 clips (--twice sample)
python test_models.py something \
    --weights=pretrained/TSM_something_RGB_resnet50_shift8_blockres_avg_segment8_e45.pth \
    --test_segments=8 --batch_size=72 -j 24 --test_crops=3  --twice_sample

Training

We provided several examples to train TSM with this repo:

To train on Kinetics from ImageNet pretrained models, you can run scripts/train_tsm_kinetics_rgb_8f.sh, which contains:
```
# You should get TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e50.pth
python main.py kinetics RGB \
     --arch resnet50 --num_segments 8 \
     --gd 20 --lr 0.02 --wd 1e-4 --lr_steps 20 40 --epochs 50 \
     --batch-size 128 -j 16 --dropout 0.5 --consensus_type=avg --eval-freq=1 \
     --shift --shift_div=8 --shift_place=blockres --npb
```
You should get TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e50.pth as downloaded above. Notice that you should scale up the learning rate with batch size. For example, if you use a batch size of 256 you should set learning rate to 0.04.

After getting the Kinetics pretrained models, we can fine-tune on other datasets using the Kinetics pretrained models. For example, we can fine-tune 8-frame Kinetics pre-trained model on UCF-101 dataset using uniform sampling by running:

python main.py ucf101 RGB \
     --arch resnet50 --num_segments 8 \
     --gd 20 --lr 0.001 --lr_steps 10 20 --epochs 25 \
     --batch-size 64 -j 16 --dropout 0.8 --consensus_type=avg --eval-freq=1 \
     --shift --shift_div=8 --shift_place=blockres \
     --tune_from=pretrained/TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e50.pth

To train on Something-Something dataset (V1&V2), using ImageNet pre-training is usually better:

python main.py something RGB \
     --arch resnet50 --num_segments 8 \
     --gd 20 --lr 0.01 --lr_steps 20 40 --epochs 50 \
     --batch-size 64 -j 16 --dropout 0.5 --consensus_type=avg --eval-freq=1 \
     --shift --shift_div=8 --shift_place=blockres --npb

Live Demo on NVIDIA Jetson Nano

We have build an online hand gesture recognition demo using our TSM. The model is built with MobileNetV2 backbone and trained on Jester dataset.

Recorded video of the live demo [link]
Code of the live demo and set up tutorial: online_demo

temporal-shift-module's People

Contributors

Stargazers

Watchers

Forkers

3dmm-icme2023 pplntech timmywy2 shuidongliu mercileesb salt-fly nash2325138 lllldoodoo damagesong saltedfishlz willprice yanjang jiazewang duanjinhao hzhang57 shiyi001 firedfree panna19951227 liu-zhy lemingguo jayleicn hyzcn karenz17 junting markgreenlqy allenzyj hyperionahead3 dukebw jon-drugstore boyuanjiang shuxjweb zgyangleo maodong2056 caoliangjie bzwqq phoenix1327 chevady formvo ryansanford4 cong-wu fjchange zhang-can kuan-wang zhengfangwu jiajiewang-xo fanbenchao allenspurs xindongol pierrehao qaz734913414 tomarraj008 shadowkun manik-hossain wannawannawanna felixzhang7 pilotbear lzxe jinliwei1997 jinczhg lfsblack huyhoang17 dorucioclea nayan96 cbueno syedrz ek9852 oriondream peterzhousz lqchien juanlp mathrho yuan-2703 damonzhenghuang abdurrasith peterzs imagednn weixuanli-1024 raviyellani higherhu haiyang21 zxt881108 joshnoel hoch881229 dreadlord1984 twoniu-fr elevanth scarlett-liu nakarin michaelvll yangshushuaige theshadow29 equalsn bamboofx trungtv kaihemo pulinagrawal rahul24-06 rijuldhir yubangji123 abrliu

temporal-shift-module's Issues

how to train mobilenetv2 model

Thank you very much for your codebase. I have trained my own data with resnet50 successfully,but I when train it with mobilenet, the accuracy is very low.

python main.py ucf101 RGB --arch mobilenetv2 --num_segments 8 --gd 20 --lr 0.001 --lr_steps 10 20 --epochs 25 --batch-size 2 -j 16 --dropout 0.8 --consensus_type=avg --eval-freq=1 --shift --shift_div=8 --shift_place=blockres

Freezing BatchNorm2D except the first one.
Epoch: [24][0/104], lr: 0.00001 Time 15.333 (15.333) Data 15.214 (15.214) Loss 0.6946 (0.6946) Prec@1 50.000 (50.000) Prec@5 100.000 (100.000)
Epoch: [24][20/104], lr: 0.00001 Time 0.085 (0.815) Data 0.000 (0.725) Loss 0.6946 (0.6896) Prec@1 50.000 (54.762) Prec@5 100.000 (100.000)
Epoch: [24][40/104], lr: 0.00001 Time 0.084 (0.459) Data 0.000 (0.371) Loss 0.6947 (0.6907) Prec@1 50.000 (53.659) Prec@5 100.000 (100.000)
Epoch: [24][60/104], lr: 0.00001 Time 0.086 (0.336) Data 0.000 (0.250) Loss 0.6946 (0.6894) Prec@1 50.000 (54.918) Prec@5 100.000 (100.000)
Epoch: [24][80/104], lr: 0.00001 Time 0.082 (0.274) Data 0.000 (0.188) Loss 0.6391 (0.6893) Prec@1 100.000 (54.938) Prec@5 100.000 (100.000)
Epoch: [24][100/104], lr: 0.00001 Time 0.084 (0.236) Data 0.000 (0.151) Loss 0.6946 (0.6926) Prec@1 50.000 (51.980) Prec@5 100.000 (100.000)
Test: [0/12] Time 2.424 (2.424) Loss 0.7487 (0.7487) Prec@1 0.000 (0.000) Prec@5 100.000 (100.000)
Testing Results: Prec@1 52.174 Prec@5 100.000 Loss 0.69226
Best Prec@1: 52.174

why?

How to load model trained by myself to test_models.py

I used this command to train the TSM model:

# You should get TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e50.pth
python main.py kinetics RGB \
     --arch resnet50 --num_segments 8 \
     --gd 20 --lr 0.02 --wd 1e-4 --lr_steps 20 40 --epochs 50 \
     --batch-size 128 -j 16 --dropout 0.5 --consensus_type=avg --eval-freq=1 \
     --shift --shift_div=8 --shift_place=blockres --npb

And I got a ckpt.best.pth.tar and ckpt.pth.tar, likely including the model parameter, the model structure information, but the test_models.py only need the parameter. I tried to save the model parameter in ckpt.pth.tar and deleted the lines in test_models.py:

    # base_dict = {('base_model.' + k).replace('base_model.fc', 'new_fc'): v for k, v in list(checkpoint.items())}
    base_dict = {'.'.join(k.split('.')[1:]): v for k, v in list(checkpoint.items())}
    replace_dict = {'base_model.classifier.weight': 'new_fc.weight',
                    'base_model.classifier.bias': 'new_fc.bias',
                    }
    for k, v in replace_dict.items():
        if k in base_dict:
            base_dict[v] = base_dict.pop(k)

    net.load_state_dict(base_dict)

However, I got very low accuracy. Please tell me how to load the parameter rightly. THX.

A question about optimizer policy

Appreciated for your great work and kind code sharing!
I notice that there is a complex optimizer policy in the TSN model. A part of that is like:
{'params': first_conv_weight, 'lr_mult': 5 if self.modality == 'Flow' else 1, 'decay_mult': 1,
'name': "first_conv_weight"},
However, I suppose that the embedded pytorch SGD optimizer cannot identify the parameters like 'lr_mult' and 'decay_mult', which are from Caffe framework. Considering there is no specific function to override the 'step' func in the original SGD class, I deem that those complex optimizer policy is indeed without efficiency.
Please disabuse me if I misunderstand this part.

About 3D Network？

Hello！Thanks for your excellent work.I find there are very little 3D works for sthv1/v2 datastes.
I check the leaderboard of sthv2.The top methods nearly all 2D.The performance of 3D is far away 2D works. In fact ,3D conv is proved more suitable for capturing space-time information.The top Acc of UCF/HMDB/Kinetics are all 3D methods.
So what's your opinions about there are fewer 3D works and their Acc are lower on sthv1/v2?
Looking forward to your reply soon.Thanks.

How long did TSM take to train from scratch on Kinectics ?

Hey Hi,

Thank you for your work.
I am trying to train TSM online version on kinectics with resnet 50 and it has been two days and it has not passed two epocs.

How long did it take to train TSM network from scratch online version for both for resnet 50 and mobilenetv2? I just wanted to make sure if i am in the right path.

Pretrained model on Kinetics with input of Optical Flow

Hi Dr Lin,
Did you train TSM on Kinetics dataset using optical flow as the input modality?
If so, could you please release the pre-trainded model on Kinetics with the input of optical flow?
Thank you!

如果不用预训练模型，从头训练，opts.py文件里的参数该如何配置

我没有加载预训练模型，backbone用的resnet50，但是一直不收敛。希望作者大大帮忙解答，谢谢！

Segmentation fault in the online_demo code

Hi, thanks for sharing this work

I'm having a segmentation fault when running online_demo code. Here is the error:

UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
  warnings.warn("The use of the transforms.Scale transform is deprecated, " +
Build Executor...
Segmentation fault (core dumped)

I found that the program is crashing in line 35 of the main.py file and I'm currently using the LLVM-4.0.0 on ubuntu 16.04.

relay_module, params = tvm.relay.frontend.from_onnx(onnx_model, shape=input_shapes)

Can anyone replicate this problem?

Thanks for your help.

how to process the dataset ?

hi,dear,
for I have no the dataset, and I just want to use my own dataset ,then how to modify the code in
vid2img_kinetics,
or you can supply the usage of the script and tell me the what's the kinetics400 configure,
thx

the result in the arxiv v1 version is inconsistent with ICCV19 version

Thanks for your good work!

I follow TSM early after submit to arxiv first time. And I find the result in older version https://arxiv.org/pdf/1811.08383v1.pdf, Table 2,
TSM ResNet50 16 65G 24.3M 44.8 74.5
while in your ICCV paper under the same setting, the result is .
TSM ResNet50 16 65G 24.3M 47.2 77.1

While there is no difference in performance between kinetics pretrained model in these two versions, If the reason is using different hyperparmater or training the data more sufficient?
Looking forward to your reply! :)

The accuary of TSM-NL is only 57.85%

When I run this command:

# test NL TSM using non-local testing protocol
python test_models.py kinetics \
    --weights=pretrained/TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e100_dense_nl.pth \
    --test_segments=8 --test_crops=3 \
    --batch_size=8 --dense_sample --full_res

I got only 57.85% Overall Prec@1. Looking forward to your reply.

Can we finetune on HMDB51 dataset from the mobilenetv2 pretrained weights?

Hi,

Thanks for sharing the mobilenetV2 pretrained weights for online TSM on kinetics.
Can we fine-tune it on HMDB51 dataset the same way mentioned in the GitHub repository? If so what might be the expected accuracy on the smaller data sets?

Build Executor... taking time

I have 1060ti graphics card 16gb ram i7 processor but Build Executor... is taking more than 5min don't know why plus its using 1 cpu 100% and ram 2 gb even graphics memory 500mb

Open camera...
<VideoCapture 0x7fef3986be10>
Build transformer...
/media/mustafa/ubuntu_backup/anaconda3/envs/video_action/lib/python3.7/site-packages/torchvision/transforms/transforms.py:210: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
  warnings.warn("The use of the transforms.Scale transform is deprecated, " +
Build Executor...

code for video object detection

Hi, will you release the training code for video object detection? thx.

set params when training the flow model

I'm trying to reproduce the two-stream results of TSM on Something v1, but the performance of my flow model is far below. (segment based sampling method)

I understand the 10 channels stacked optical flow (TV-L1) / learning rate 5 times in the first conv layer.

Is there any difference between RGB and Flow model in setting params?
(e.x, epochs, learning rate...)

Some doubts about the performance

Hi!
Thanks for your interesting work and the source code.
I find that the performance on Sthv1 of TSM with 8-frames and ResNet-50 backbone, efficient test setting is much better than your paper. Have you made any improvements to the original paper?
And can you share the training script on Sthv1 for TSM with 8-frames and ResNet-50 backbone, which can get the same Top-1 Acc in the GitHub?

Thanks very much

Legacy autograd function with non-static forward method is deprecated and will be removed in 1.3. Please use new-style autograd function with static forward method

when run the repo, it will keep poping the UserWarning:
/pytorch/torch/csrc/autograd/python_function.cpp:638: UserWarning: Legacy autograd function with non-static forward method is deprecated and will be removed in 1.3. Please use new-style autograd function with static forward method. (Example: https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function)

os:win7
pytorch:1.2
python:3.5

Question about finetune on UCF101

I download the pretrained model: TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e50.pth
and finetune it on UCF101-split1 using the command below:
'''
python main.py ucf101 RGB
--arch resnet50 --num_segments 8
--gd 20 --lr 0.001 --lr_steps 10 20 --epochs 25
--batch-size 64 -j 16 --dropout 0.8 --consensus_type=avg --eval-freq=1
--shift --shift_div=8 --shift_place=blockres
--tune_from=pretrained/TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e50.pth
'''
but official UCF101 dataset doesn't provide validation dataset, so I split UCF101-split1 into 9:1 , 9 for training , 1 for validation.

after the training process, I test the model on UCF101-split 1 using the command below:
'''
python test_models.py ucf101
--weights=checkpoint/TSM_ucf101_RGB_resnet50_shift8_blockres_avg_segment8_e25/ckpt.best.pth.tar
--test_segments=8 --batch_size=72 -j 24 --test_crops=3 --twice_sample --full_res
'''

and I only get Acc1 93.4%, I want to know what I did wrong, and how can i reproduce your result in the paper Acc1 95.9%.
I really appreciate your reply, thank you very much!

When will the parameters of TSM-MoileNetV2 pretrained on Kinetics be released？

Training MobilenetV2 from scratch on Kinetics

Hi Ji,

Thanks for your novel work. I wonder that, have you tried to train MobileNetV2 from scratch (without any pre-trained weight) on Kinetics or UCF-101? Could you share the configs of this setting like lr and bs?

Thanks!

aarch64: libgomp.so.1: cannot allocate memory in static TLS block

I have followed this to install opencv but getting error aarch64: libgomp.so.1: cannot allocate memory in static TLS block
import cv2 in line 8

[solved problem] for pretrained something-v2 models, it should be "n_round = 2"

Thanks for your great work!

I tried to run

python test__models.py somethingv2 \
   --weights=pretrained/TSM_somethingv2_RGB_resnet101_shift8_blockres_avg_segment8_e45.pth \
   --test_segments=8 --batch_size=24 -j 12 --full_res --test_crops=3 --twice_sample

and encountered the following error message

RuntimeError: Error(s) in loading state_dict for TSN:
        Missing key(s) in state_dict: "base_model.layer1.1.conv1.net.weight", "base_model.layer2.1.conv1.net.weight", "base_model.layer2.3.conv1.net.weight", "base_model.layer3.1.conv1.net.weight", "base_model.layer3.3.conv1.net.weight", "base_model.layer3.5.conv1.net.weight", "base_model.layer3.7.conv1.net.weight", "base_model.layer3.9.conv1.net.weight", "base_model.layer3.11.conv1.net.weight", "base_model.layer3.13.conv1.net.weight", "base_model.layer3.15.conv1.net.weight", "base_model.layer3.17.conv1.net.weight", "base_model.layer3.19.conv1.net.weight", "base_model.layer3.21.conv1.net.weight", "base_model.layer4.1.conv1.net.weight".
        Unexpected key(s) in state_dict: "base_model.layer1.1.conv1.weight", "base_model.layer2.1.conv1.weight", "base_model.layer2.3.conv1.weight", "base_model.layer3.1.conv1.weight", "base_model.layer3.3.conv1.weight", "base_model.layer3.5.conv1.weight", "base_model.layer3.7.conv1.weight", "base_model.layer3.9.conv1.weight", "base_model.layer3.11.conv1.weight", "base_model.layer3.13.conv1.weight", "base_model.layer3.15.conv1.weight", "base_model.layer3.17.conv1.weight", "base_model.layer3.19.conv1.weight", "base_model.layer3.21.conv1.weight", "base_model.layer4.1.conv1.weight".

I solved this problem by changing the line n_round = 1 in ops/temporal_shift.py to n_round = 2.

Script for training TSM on something-something-v2

Hi Ji, thank you for publishing the work.

I want to double-check training parameters for training TSM on something-v2 which should achieve at least 58.8 which is the performance I got when testing with your weight. (tested with single crop and single clip)

According to your paper, the training parameters for the something-something-v2 dataset are: 50 training epochs, initial learning rate 0.01 (decays by 0.1 at epoch 20&40), weight decay 1e-4, batch size 64, and dropout 0.5. And the model is fine-tuned from ImageNet pre-trained weights.

However, the script in the git repository indicates that initial learning rate is 0.001, weight decay is 5e-4, and the model is tuned-from Kinetics pre-trained weight.

Due to this disparity, I am confusing which parameters should I use to reproduce the number. Could you provide accurate parameters for training TSM on something-v2?

Thank you.

404 to install openCV 4 on Nano

This one works

https://pysource.com/2019/08/26/install-opencv-4-1-on-nvidia-jetson-nano/

questions on online video object detection

Congratulations on the great work!

As noted in the supplementary section: "... we inserted uni-directional TSM to the backbone, while keeping other settings the same. We used the official training code of [60] to conduct the experiments".

May I ask a few questions on online video object detection:

How many frames are used during training? 21 frames the same as FGFA?
What is your learning rate policy? and optimizer? the same as FGFA?

how to test on a movie? mp4 or avi

hi, dear,
have tested the test_models.py a liittle difficult to read,
if I just want to test on some movies ,then how should I modify the codes,
thx

any advice or suggsetion will be appreciated.

if have a key argument 'video_path' will be more convenient
do not want test on the txt file below

Traceback (most recent call last):
  File "test_models.py", line 182, in <module>
    ]), dense_sample=args.dense_sample, twice_sample=args.twice_sample),
  File ".\temporal-shift-module\ops\dataset.py", line 58, in __init__
    self._parse_list()
  File ".\temporal-shift-module\ops\dataset.py", line 96, in _parse_list
    tmp = [x.strip().split(' ') for x in open(self.list_file)]
FileNotFoundError: [Errno 2] No such file or directory: '/ssd/video/kinetics/labels/val_videofolder.txt'

Pretrain model on jester dataset ?

Hi, Author! Thank you for sharing such great jobs! I'm very interesting in your paper. Could you provid the pretrain model or the trainng script on jester dataset?

Training speed about Optical Flow Model.

Hi, Author.
I set the param 'num_segment=16; batch-size=32' and train the Optical Flow Model on Kinetics dataset. The model can converg, but the training speed is very slow. Do you have the same question? or how to solve? Looking forward to your reply, thank you very much!

Can the model run in mac cpu mode?

ResNet50 Pretrained Models

Hi, Thank you for your TSM code!

But I'm wondering if there is code for resnet50 pretrained models (not the weights).

代码哪里可以设置成Offline temporal shift (bi-direction)模式？

我想试试双向TSM的效果，不知道在哪里修改可以实现？希望得到作者大大的回复，感谢！

Training script for online TSM

Hi!
Thanks for the impressive work with publicly accessible source code here :)
I am trying to train another application for online TSM with different datasets and adding a few adjustments. The repo currently only has the offline version of training script. Is it possible for you also providing the training script for the online TSM?
Thank you very much!

Accuracy in the somethingv1 dataset

Can the test results on the somethingv1 dataset and the hyperparameter Settings achieve the accuracy of 47.3% in the paper? Num - segments = 8? Epoch25 isn't enough, is it? My 25 epochs are only 45.98 percent

Python main.py something RGB \

--arch resnet50 --num_segments 8 \

--gd 20 --lr 0.001 --lr_steps 10 20 --epochs 25 \

--batch-size 1-j 16 --dropout 0.5 --consensus_type=avg --eval-freq=1 \

shift - shift_div = 8 -- shift_place = blockres \

-- tune_from = pretrained/TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e50. PTH

could use the TSM model to get video embedding features ?

hi, dear

If I want to get the embedding features from the TSM pretrained model ,
could you please help me ?
thx

How to set the params when training the flow model?

Keras implementation of the temporal shift

Anyway you can release a Keras version of the Temporal Shift in ops/temporal_shift.py. I am looking at implementing this in a custom layer to use with my project.

The accuracy of TSM finetuned in UCF101

I got how to finetune TSM in UCF101, but there is not the accuracy posted. Could you tell me the accuracy? THX.

which dataset uses the pertained model from kinetics?

Hi, thanks for the code release. In your first version of Arxiv paper,

We then fine-tuned the model to other target datasets like Something- Something [12], UCF101 [34], and HMDB51 [22]

In the most current version

For most of the datasets, the model is fine-tuned from ImageNet pre-trained weights; while HMDB-51 [26] and UCF-101 [40] are too small and prone to over-fitting [48], we followed the common practice [48, 49] to fine-tune from Kinetics [25] pre-trained weights and freeze the Batch Normalization [22] layers.

Which dataset is trained using the pre-trained model to get the score reported in the paper? Jester, UCF101 and HDMB? Are the parameters set for Jester and HDMB the same as UCF101?

Thanks again.

Online Demo error

Installing everything on a nano with a jetson sd card image r32.2
when launching /onlinedemo/main.py on python3 here the raise error
CUDAError: Check failed: ret == 0 (-1 vs. 0) : cuModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: CUDA_ERROR_INVALID_PTX

cuda is my path

Thanks for helping

notice that
1/ when making tvm, no /nnvm/python directory is generated.
2/ tried first to install on last release sd card image r32.3.1 and didn't succeed
=> on what version of jetson nano sd card you made it worked ?

how to test ?

hi,dear
Do not want test on the txt file's images
Just want to test other images or videos ,so how to ?
pls supply the file

kinetics/labels/val_videofolder.txt

or tell me what's on it ?

thx

IndexError: list index out of range???

Hi, thanks for sharing this work

I'm having a fault when running online_demo code. Here is the error:

Traceback (most recent call last):

File "main.py", line 349, in
main()

File "main.py", line 323, in main
idx, history = process_output(idx_, history)

File "main.py", line 249, in process_output
if not (history[-1] == history[-2]): # and history[-2] == history[-3]):

IndexError: list index out of range

where is archs.small_resnet.ResNet?

hi,dear
in the script non_local.py
I find the archs.small_resnet.ResNet
but in the archs/ I couldn't find it
So could you help me ?
thx

The model performance of train-from-scratch models

Hi,

Thanks for your amazing work!

I'm new in video analysis, I'm wondering the model performance if you do not load ImageNet pretrained weights? And what if you load pretrained weights on other task dataset, e.g, detecton on MS-COCO?

I did not find you report this issue in your paper or your code, thanks for your help!

Can you share the repository of kinectics 400 dataset that you had used for training?

Hi,

Thank you for your work . I was trying to use the TSM module and also check with the reported accuracy but the test_models.py is expecting a val_folder.txt and train_folder.txt(basically train and validation file list).
I tried to download the kinetics 400 data set(download from official code from the script in activity net) but the recent one has so many expired/broken YouTube links . If possible could you please give access to the kinetics data set that you used for training?.

Running into the following error when trying to run in parallel in two gpus

I just ran the training script
"python3 main.py kinetics RGB --arch mobilenetv2 --num_segments 8 --gd 20 --lr 0.02 --wd 1e-4 --lr_steps 20 40 --epochs 50 --batch-size 128 -j 16 --dropout 0.5 --consensus_type=avg --eval-freq=1 --shift --shift_div=8 --shift_place=blockres --npb --gpus 1"

But i ran into below error.

File "main.py", line 249, in train
output = model(input_var)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 146, in forward
"them on device: {}".format(self.src_device_obj, t.device))
RuntimeError: module must have its parameters and buffers on device cuda:1 (device_ids[0]) but found one of them on device: cuda:0

Also the loss,model,input and target are in GPU . What are the real expected settings? Please let us know

For Uni-directional TSM for online video detection

Thank you for the wonderful work.

For Uni-directional TSM for online video detection what is the network backbone used? Resnet 101 or mobilenetV2?
Also can you elaborate on the below lines from the paper. Like how the training and validation is carried out?
I am trying to reproduce the same result.

We show that we can significantly improve the performance of video detection by simply modifying the backbone with online TSM, without changing the detection module design or using optical flow features

For TSM experiments, we inserted uni-directional TSM to the backbone, while keeping other settings the same.

And if possible please release the online training script.

Segmentation fault when running demo on ubuntu

I think online_demo will only work for jetson nano how can i run this on my laptop I install all the packages on laptop getting this error.

Open camera...
<VideoCapture 0x7f4e749c3270>
Build transformer...
/media/mustafa/ubuntu_backup/anaconda3/envs/video_action/lib/python3.7/site-packages/torchvision/transforms/transforms.py:210: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
  warnings.warn("The use of the transforms.Scale transform is deprecated, " +
Build Executor...
/media/mustafa/ubuntu_backup/Projects/video_action/temporal-shift-module/online_demo/mobilenet_v2_tsm.py:95: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  x1, x2 = x[:, : c // 8], x[:, c // 8:]
Segmentation fault```

How to train a RGB + Flow model?

Thanks for the repo! I'm wondering is there a command to train a RGB + Flow model on a pretrained model?

run the online_demo by using my trained model

I got into a trouble when I run the online_demo by using my trained model.
My trained model has only 4 classes, but an unexpected error occurred. The error message is as follows.

File "main.py", line 319, in main
    cv2.putText(label, 'Prediction: ' + catigories[idx],

IndexError: list index out of range

I have modified catigories as 4 classes.

It seems like a simple bug but I don't have a clue about it.

Use my own data

Thanks a lot for the author's code, the effect is amazing. But I have a problem and I am looking forward to helping: I have implemented the real-time pre-processing of the video frame captured by the webcam, and then how to use this model for motion recognition?
Can someone give me some advice, thank you very much!