wanglimin / artnet Goto Github PK

Appearance-and-Relation Networks

Shell 4.96% Python 95.04%

video video-classification deep-neural-networks

artnet's Introduction

Appearance-and-Relation Networks

We provide the code and models for the following report (arXiv Preprint):

  Appearance-and-Relation Networks for Video Classification
  Limin Wang, Wei Li, Wen Li, and Luc Van Gool
  in arXiv, 2017

Updates

November 23th, 2017
- Initialize the repo.

Overview

ARTNet aims to learn spatiotemporal features from videos in an end-to-end manner. Its construction is based on a newly-designed module, termed as SMART block. ARTNet is a simple and general video architecture and all these relased models are trained from scratch on video dataset. Currently, for an engineering compromise between accuracy and efficiency, ARTNet is instantiated with the ResNet-18 architecture and trained on the input volume of 112*112*16.

Training on Kinetics

The training of ARTNet is based on our modified Caffe toolbox. Specical thanks to @zbwglory for modifying this code.

The training code is under folder of models/.

Performance on the validation set of Kinetics

Model	Backbone architecture	Spatial resolution	Top-1 Accuracy	Top-5 Accuracy
C2D	ResNet18	112*112	61.2	82.6
C3D	ResNet18	112*112	65.6	85.7
C3D	ResNet34	112*112	67.1	86.9
ARTNet (s)	ResNet18	112*112	67.7	87.1
ARTNet (d)	ResNet18	112*112	69.2	88.3
ARTNet+TSN	ResNet18	112*112	70.7	89.3

These models are trained on the Kinetics dataset from scratch and tested on the validation set. Our training is performed based on the input volume of 112*112*16. The test is performed by cropping 25 clips from the videos.

Fine tuning on HMDB51 and UCF101

The fine tuning process is conducted based on the TSN framework, where segment number is 2.

The fine tuning code is under folder of fine_tune/

Performance on the datasets of HMDB51 and UCF101

Model	Backbone architecture	Spatial resolution	HMDB51	UCF101
C3D	ResNet18	112*112	62.1	89.8
ARTNet (d)	ResNet18	112*112	67.6	93.5
ARTNet+TSN	ResNet18	112*112	70.9	94.3

These models learned on the Kinetics dataset are transferred to the HMDB51 and UCF101 datasets. The fine-tuning process is done with TSN framework where the segment number is 2. The performance is reported over three splits by using only RGB input.

artnet's People

Contributors

Stargazers

Watchers

Forkers

erinchen824 issac8huxley peratham wsyjwps1983 labimage lyk125 choiyeren xuyunlu1030 ccv-edward zcrwind dz1135508698 carly666 sherzz willdamon 3dmm-icme2023 yusea zumbalamambo iqbal-chowdhury zakia13 zxt881108 fendaq congmonkey kekedan locussam zhang-can mygmyg ml-lab nourelmadany lichangw gy12346123 danielanojan hbcbh1999 wangshicr7 itbeyond1230 fytrace ammieqi ivyvideo solomon1588 mengruxing yunwenhuang salt-fly tuzhenyuan zomkey liuxiao214 kaihemo wh-forker xiaoanshi lemingguo amirunpri2018 lingeo caoliangjie fendou201398 lizhaodong dreamer121121 wangtaohw xuehao-gao iq-scm jore1001

artnet's Issues

Why all bias term of Convolutional Layesr are setting as "false" ?

Why all bias term are "false" in Conv Layers?

Error parsing text-format caffe.NetParameter: 20:17: Message type "caffe.VideoDataParameter" has no field named "length_first".

Hi, i met the error when i run "ucf_tsn_112_artnet_resnet_18". What should i do to solve it.
Thanks.

I0903 20:26:30.414449  7488 caffe.cpp:190] Starting Optimization
I0903 20:26:30.414571  7488 solver.cpp:34] Initializing solver from parameters:
test_iter: 475
test_interval: 500
base_lr: 0.001
display: 20
max_iter: 3500
lr_policy: "step"
gamma: 0.1
momentum: 0.9
weight_decay: 0.0005
stepsize: 1500
snapshot: 500
snapshot_prefix: "ucf101_split1_tsn_artnet_seg_2"
solver_mode: GPU
device_id: 2
debug_info: false
net: "ucf_tsn_112_artnet_resnet_18_train_val.prototxt"
test_initialization: true
average_loss: 20
clip_gradients: 40
iter_size: 1
richness: 100
I0903 20:26:30.414624  7488 solver.cpp:75] Creating training net from net file: ucf_tsn_112_artnet_resnet_18_train_val.prototxt
[libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.NetParameter: 20:17: Message type "caffe.VideoDataParameter" has no field named "length_first".
F0903 20:26:30.414790  7488 upgrade_proto.cpp:928] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: ucf_tsn_112_artnet_resnet_18_train_val.prototxt
*** Check failure stack trace: ***
    @     0x7f21220b384d  google::LogMessage::Fail()
    @     0x7f21220b561c  google::LogMessage::SendToLog()
    @     0x7f21220b343c  google::LogMessage::Flush()
    @     0x7f21220b5f2e  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f2122c56f8e  caffe::ReadNetParamsFromTextFileOrDie()
    @     0x7f2122c20832  caffe::Solver<>::InitTrainNet()
    @     0x7f2122c218a3  caffe::Solver<>::Init()
    @     0x7f2122c21a76  caffe::Solver<>::Solver()
    @           0x40f820  caffe::GetSolver<>()
    @           0x4088fa  train()
    @           0x406e26  main
    @     0x7f210eaef445  __libc_start_main
    @           0x4073dd  (unknown)

how_to_make_with_python

@wanglimin Thanks for your nice works!
When I use cmake tools(cmake .. -DUSE_MPI=ON -DMPI_CXX_COMPILER=" /usr/local/bin/mpicxx")to complile the ARTNet code, i see the following logs:
-- Could NOT find Boost
-- Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE)
-- Python interface is disabled or not all required dependecies found. Building without it...

Thus, I can not make caffe with python and import caffe successfully. Could you please help me to build it with python? Using make rather than cmake? Thanks in advance!

maybe a small spelling mistake o(￣︶￣)o

In paragraph 2 in page 2 of the paper, I think it should be "superior performance to the# existing state-of-the- art methods on this challenging benchmark under the set- ting of training from scratch with only RGB input."

Thang you for your outstanding work!!! Best regards!

您好，请问一下，您的输入数据是什么格式的，我想自己制作数据集，实现一下您的方法。

caffe out of memory

@wanglimin
When training use https://github.com/yjxiong/caffe

I find when training, the memory is not a fixed value and is becoming larger, and then this happend.
"""
Out of memory: Kill process 33049 (caffe) score 123 or sacrifice child
Killed process 33049 (caffe) total-vm:118020040kB, anon-rss:32252368kB, file-rss:83732kB
"""
It occures when traing ARTNet-18, but C3D-resnet18 is OK. ALL models and prototxt is official released. Is there something I need to modify the code?
Thank you in advance!

new_length in the "C3D_ResNet18 Flow" experiment

Hi,
In your paper, you report a top-1 error of 42.5% when using C3D_ResNet18 with Flow Modality in Table 3.
Could you tell which new_length you use to do the experiment, 8 or 16 ,or other length?

Thank you.

about the new length and the num_segments

I don't understand exactly about the new_length and num_segments。

I download the "112_c3d_resnet_18_kinetics.caffemodel" and remove the FC layers。
And I use the model without FC layer to finetune UCF101 ：
if new_length=16, num_segments=1, the top1 got 83%;
and new_length=1 , num_segments=16, the top1 only 78%;

I saw in TSNg github that ：new_length=1 ,num_segment=3 may means averagely split the video into 3 segment and take 1 frame from each segment。
How about new_length=16, num_segments=1 ？

Any test code/script for visualize results in terms of images/video?

the url for downing pretrained model can not be found

Could you offer a new url for pretrained model? Thanks

wanglimin / artnet Goto Github PK

artnet's Introduction

Appearance-and-Relation Networks

Updates

Overview

Training on Kinetics

Performance on the validation set of Kinetics

Fine tuning on HMDB51 and UCF101

Performance on the datasets of HMDB51 and UCF101

artnet's People

Contributors

Stargazers

Watchers

Forkers

artnet's Issues

Recommend Projects

Recommend Topics

Recommend Org