Continual 3D Convolutional Neural Networks

Continual 3D Convolutional Neural Networks (Co3D CNNs) are a novel computational formulation of spatio-temporal 3D CNNs, in which videos are processed frame-by-frame rather than by clip.

In online processing tasks demanding frame-wise predictions, Co3D CNNs dispense with the computational redundancies of regular 3D CNNs, namely the repeated convolutions over frames, which appear in multiple clips.

Co3D CNNs are weight-compatible with regular 3D CNNs, do not need further training, and reduce the floating point operations for frame-wise computations by more than an order of magnitude!

News

2022-07-04 Our paper "Continual 3D Convolutional Neural Networks for Real-time Processing of Videos" has been accepted at the European Conference on Computer Vision (ECCV) 2022.

Principle

Continual Convolution. An input (green d or e) is convolved with a kernel (blue α, β). The intermediary feature-maps corresponding to all but the last temporal position are stored, while the last feature map and prior memory are summed to produce the resulting output. For a continual stream of inputs, Continual Convolutions produce identical outputs to regular convolutions.

Results

Accuracy/complexity trade-off for Continual X3D CoX3D and recent state-of-the-art 3D CNNs on Kinetics-400 using 1-clip/frame testing. For regular 3D CNNs, the FLOPs per clip ■ are noted, while the FLOPs per frame ● are shown for the Continual 3D CNNs. The CoX3D models used the weights from the X3D models without further fine-tuning. The global average pool size for the network is noted in each point. The diagonal and vertical arrows indicate respectively a transfer from regular to Continual 3D CNN and an extension of receptive field.

Benchmark of state-of-the-art methods on Kinetics-400. The noted accuracy is the single clip or frame top-1 score using RGB as the only input-modality. The performance was evaluated using publicly available pre-trained models without any further fine-tuning. For thoughput comparison, evaluations per second denote frames per second for the CoX3D models and clips per second for the remaining models. Throughput results are the mean +/- std of 100 measurements. Pareto-optimal models are marked with bold. Mem. is the maximum allocated memory during inference noted in megabytes.

Setup

Clone the project code

git clone https://github.com/LukasHedegaard/co3d
cd co3d

Create and activate conda environent (optional)

conda create --name co3d python=3.8
conda activate co3d

Install Python dependencies
```
pip install -e .[dev]
```
Install FFMPEG and UNRAR

Fill in the information on your dataset folder path in the .env file:

DATASETS_PATH=/path/to/datasets
LOGS_PATH=/path/to/logs
CACHE_PATH=.cache

Download dataset using these instructions

Models

CoX3D

CoX3D is the Continual-CNN implementation of X3D. In contrast to regular 3D CNNs, which take a whole video clip as input, Continual CNNs operate frame-by-frame and can thus speed up computation by a significant margin.

CoSlow

CoSlow is the Continual-CNN implementation of Slow.

CoI3D

CoSlow is the Continual-CNN implementation of I3d.

X3D

X3D [ArXiv, Repo] is a family of 3D variants of the EfficientNet achitecture, which produce state-of-the-art results for lightweight human activity recognition.

R(2+1)D

R(2+1)D [ArXiv, Repo] is a CNN for activity recognition, which separates the 3D convolution into a spatial 2D convolution and a temporal 1D convolution in order to reduce the number of parameters and increase the network efficiency.

I3D

I3D [ArXiv, Repo] is a 3D CNN for activity recognition, proposed to "inflate" the weights from a 2D CNN pretrained on ImageNet in the initialisation of the 3D CNN, thereby improving accuracy and reducing training time.

The implementation here is a port of the one found in the SlowFast Repo.

SlowFast

SlowFast [ArXiv, Repo] is two-stream 3D CNNs architecture for video-recognition. The structure includes two pathways with one pathway operating at a slower frame-rate than the other.

Slow

Slow is the "slow" branch of the SlowFast network [ArXiv, Repo]

Usage

The project code written in PyTorch and uses Ride to provide implementations of training, evaluations, and benchmarking methods. A plethora of usage options are available, which are best explored in the Ride docs or the command-line help, e.g.:

python models/cox3d/main.py --help

This repository contains the implementations of Continual X3D (CoX3D), as well as number of 3D-CNN baselines.

Each model has its own folder with a self-contained implementation, scripts, weight download utilities, hparams and profiling results. Overview tables for scripts used to download weights, run the model test-sequences, and throughput benchmarks are found below:

Download weights

Model	Dataset	Download
I3D-R50	Kinetics	download
R(2+1)D-18	Kinetics	download
SlowFast-8x8	Kinetics	download
SlowFast-4x16	Kinetics	download
Slow-8x8	Kinetics	download
(Co)X3D-XS	Kinetics	download
(Co)X3D-S	Kinetics	download
(Co)X3D-M	Kinetics	download
(Co)X3D-L	Kinetics	download
(Co)Slow-8x8	Charades	download

Evaluate on Kinetics400

Evaluate the 1-clip accuracy of pretrained models. The scripts should be executed from project root.

Model	Script
I3D-R50	`./models/i3d/scripts/test/kinetics400.sh`
R(2+1)D-18	`./models/r2plus1d/scripts/test/kinetics400.sh`
SlowFast	`./models/slowfast/scripts/test/kinetics400.sh`
Slow	`./models/slow/scripts/test/kinetics400.sh`
X3D	`./models/x3d/scripts/test/kinetics400.sh`
CoX3D	`./models/cox3d/scripts/test/kinetics400.sh`
CoSlow	`./models/coslow/scripts/test/kinetics400.sh`
CoI3D	`./models/coi3d/scripts/test/kinetics400.sh`

Evaluate on Charades

Evaluate the 1-clip accuracy of pretrained models. The scripts should be executed from project root.

Model	Script
(Co)Slow-8x8	`./models/coslow/scripts/test/charades.sh`

Benchmark FLOPs and throughput

The scripts should be executed from project root.

Model	Script
I3D-R50	`./models/i3d/scripts/profile/kinetics400.sh`
R(2+1)D-18	`./models/r2plus1d/scripts/profile/kinetics400.sh`
SlowFast	`./models/slowfast/scripts/profile/kinetics400.sh`
Slow	`./models/slow/scripts/profile/kinetics400.sh`
X3D	`./models/x3d/scripts/profile/kinetics400.sh`
CoX3D	`./models/cox3d/scripts/profile/kinetics400.sh`
CoI3D	`./models/coi3d/scripts/profile/kinetics400.sh`
CoSlow	`./models/coslow/scripts/profile/kinetics400.sh`

Citation

@inproceedings{hedegaard2022continual,
    title={Continual 3D Convolutional Neural Networks for Real-time Processing of Videos},
    author={Lukas Hedegaard and Alexandros Iosifidis},
    booktitle={European Conference on Computer Vision (ECCV)},
    year={2022},
}

Acknowledgement

This work has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 871449 (OpenDR).

lukashedegaard / co3d Goto Github PK

co3d's Introduction