Code Monkey home page Code Monkey logo

fastvideofeat's Introduction

Information & Contact

An earlier version of this code was used to compute the results of the following paper:

"Efficient feature extraction, encoding and classification for action recognition",
Vadim Kantorov, Ivan Laptev,
In Proc. Computer Vision and Pattern Recognition (CVPR), IEEE, 2014

If you use this code, please cite our work:

@inproceedings{kantorov2014,
      author = {Kantorov, V. and Laptev, I.},
      title = {Efficient feature extraction, encoding and classification for action recognition},
      booktitle = {Proc. Computer Vision and Pattern Recognition (CVPR), IEEE, 2014},
      year = {2014}
}

The paper and the poster are available at the project webpage or in this repository, the binaries are published on the repository releases page, the Hollywood-2 and HMDB-51 repro scripts are in the [repro directory] (http://github.com/vadimkantorov/cvpr2014/tree/master/repro/).

Please submit bugs on GitHub directly.

For any other question, please contact Vadim Kantorov at [email protected] or [email protected].

Description and usage

We release two tools in this repository. The first tool fastvideofeat is a motion feature extractor based on motion vectors from video compression information. The second is a fast Fisher vector computation tool fastfv that uses vector SSE2 CPU instructions.

We also release scripts (in the repro directory) for reproducing our results on Hollywood-2 dataset and on HMDB-51 dataset.

All code is released under the MIT license.

fastvideofeat

The tool accepts a video file path as input and writes descriptors to standard output.

Command-line options:
Option Description
--disableHOG disables HOG descriptor computation
--disableHOF disables HOF descriptor computation
--disableMBH disables MBH descriptor computation
-f 1-10 restricts descriptor computation to the given frame range

IMPORTANT Frame range is specified in terms of PTS (presentation time stamp) which are usually equivalent to frame indices, but not always. Beware. You can inspect PTS values of the frames of the video using ffmpeg's ffprobe (fourth column):

$ ffprobe -print_format csv -show_packets -select_streams 0 video.mp4

The output format (also reminded on standard error):

#Descriptor format: xnorm ynorm tnorm pts StartPTS EndPTS Xoffset Yoffset PatchWidth PatchHeight hog (dim. 96) hof (dim. 108) mbhx (dim. 96) mbhy (dim. 96)

  • xnorm and ynorm are the normalized frame coordinates of the spatio-temporal (s-t) patch
  • tnorm and pts are the normalized and unnormalized frame number of the s-t patch center
  • StartPTS and EndPTS are the frame numbers of the first and last frames of the s-t patch
  • Xoffset and Yoffset are the non-normalized frame coordinates of the s-t patch
  • PatchWidth and PatchHeight are the non-normalized width and height of teh s-t patch
  • descr is the array of floats of concatenated descriptors. The size of this array depends on the enabled descriptor types. All values are from zero to one. The first comment line describes the enabled descriptor types, their order in the array, and the dimension of each descriptor in the array.

Every line on standard output corresponds to an extracted descriptor of a patch anc consists of tab-separated floats.

Examples:
  • Compute HOG, HOF, MBH and save the descriptors in descriptors.txt:

    $ ./fastvideofeat video.avi > descriptors.txt

  • Compute only HOF and MBH from the first 600 frames and save the descriptors in descriptors.txt:

    $ ./fastvideofeat video.avi --disableHOG -f 1-600 > descriptors.txt

More examples in examples/compute_mpeg_features.sh.

Video format support

We've tested fastvideofeat only videos encoded in H.264 and MPEG-4. Whether motion vectors can be extracted and processed depends completely on FFmpeg's ability to put them into the right structures. Last time I've checked it was not working for VP9, for example. And in general, video reading depends fully on FFmpeg libraries.

fastfv

The tool accepts descriptors on the standard input and writes Fisher vector (FV) to the standard output. The tool consumes GMM vocabs saved by Yael library. A sample script to build GMM vocabs with Yael is provided, as well as its usage example.

IMPORTANT The computed Fisher vectors are non-normalized, apply signed square rooting / power normalization, L2-normalization, clipping etc before training a classifier.

Command-line options:
Option Description
--xpos 0 specifies the column with x coordinate of the s-t patch in the descriptor array
--ypos 1 specifies the column with y coordinate of the s-t patch in the descriptor array
--tpos 2 specifies the column with t coordinate of the s-t patch in the descriptor array
--knn 5 FV parts corresponding to these many closest GMM centroids will be updated during processing of every input descriptor
--vocab 10-105 10-105.hog.gmm specifies descriptor type location and path to GMM vocab. This option is mandatory, and several options of this kind are allowed.
--enableflann 4 32 use FLANN instead of knn for descriptor attribution, first argument is number of kd-trees, second argument is number of checks performed during attribution
--enablespatiotemporalgrids enables spatio-temporal grids: 1x1x1, 1x3x1, 1x1x2
--enablesecondorder enables second-order part of the Fisher vector
Examples:
  • Compute Fisher vector:

    $ zcat sample_features_mpeg4.txt.gz | ../bin/fastfv --xpos 0 --ypos 1 --tpos 2 --enablespatiotemporalgrids --enableflann 4 32 --vocab 10-105 hollywood2_sample_vocabs/10-105.hog.gmm --vocab 106-213 hollywood2_sample_vocabs/106-213.hog.gmm --vocab 214-309 hollywood2_sample_vocabs/214-309.mbhx.gmm --vocab 310-405 hollywood2_sample_vocabs/310-405.mbhy.gmm > fv.txt

  • Build GMM vocab with Yael:

    $ PYTHONPATH=$(pwd)/../bin/dependencies/yael/yael:$PYTHONPATH cat features*.gz | ../src/gmm_train.py --gmm_ncomponents 256 --vocab 10-105 10-105.hog.gmm

Examples are explained in examples/compute_fisher_vector.sh.

Performance

We haven't observed enabling second order boosts accuracy, so it's disabled by default. Enabling second order part increases Fisher vector size twice.

Using simple knn descriptor attribution (default) beats FLANN in speed by a factor of two, yet leads to <1% accuracy degradation. It's enabled by default because of its speed.

Enabling spatio-temporal grids (disabled by default) is important for maximum accuracy (~2% gain).

If you use FLANN, it's the number of checks that defines speed, try reducing it to gain speed.

Building from source

On both Linux and Windows, the binaries will appear in bin after building. By default, code links statically with dependencies below, check Makefiles for details.

Dependencies for fastvideofeat:

Dependencies for fastfv:

The code is known to work with OpenCV 2.4.9, FFmpeg 2.4, Yael 4.38, ATLAS 3.10.2, LAPACK 3.5.0.

Linux

Make sure you have the dependencies installed and visible to g++ (a minimal installation script is in the bin/dependencies directory). Build the tools by running make.

Windows

Only fastvideofeat builds and works on Windows, fastfv doesn't build because yael currently does not support Windows.

To build fastvideofeat, set in Makefile the good paths to the dependencies, processor architecture and Visual C++ version, and run from an appropriate Visual Studio Developer Command Prompt (specifically, VS2013 x64 Native Tools Command Prompt worked for us):

$ nmake -f Makefile.nmake

Notes

For practical usage, software needs to be modified to save and read features in some binary format, because the overhead on text file reading/writing is huge.

License and acknowledgements

All code and scripts are licensed under the MIT license.

We greatly thank Heng Wang and his work which was of significant help.

fastvideofeat's People

Contributors

vadimkantorov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fastvideofeat's Issues

Errors Creating Vocabulary

I am running into another error when I extract hof vocabulary. I am using descriptors from around 400 videos (around 2-3 million descriptors) to estimate the hof GMM vocabulary for fisher vectors. When I run the fastfv program with --buildGmmIndex option, I run into this problem.

$ cat vocabFileList.txt | xargs cat | ./fastfv --sigma yes --buildGmmIndex --vocab 105-212 hof_K256.vocab
​ :​

kmeans warning: 11 empty clusters -> split

:
imfac = 6.665
keys 11: 2203.505343 -> 2194.710435
nb of 0 probabilities: 733755570 / (256_2889743) = 99.2 %
Assertion failed: (isfinite(fvec_sum(g->mu, k_d))), function gmm_compute_params, file gmm.c, line 218.
Abort trap: 6

I tried different sets of files as input but I get the same error. ​Can you please help me resolve this. Thank you for your help.

cv.h #included but dependencies not built

From cv.h:

#include "opencv2/core/core_c.h"
#include "opencv2/core/core.hpp"
#include "opencv2/imgproc/imgproc_c.h"
#include "opencv2/imgproc/imgproc.hpp"
#include "opencv2/video/tracking.hpp"
#include "opencv2/features2d/features2d.hpp"
#include "opencv2/flann/flann.hpp"
#include "opencv2/calib3d/calib3d.hpp"
#include "opencv2/objdetect/objdetect.hpp"
#include "opencv2/legacy/compat.hpp"

But in bin/dependencies/install_dependencies_here_linux.sh all of these dependencies are "OFF":

wget http://downloads.sourceforge.net/project/opencvlibrary/opencv-unix/2.4.9/opencv-2.4.9.zip
unzip opencv-2.4.9.zip
cd opencv-2.4.9
mkdir build && cd build
cmake -D CMAKE_INSTALL_PREFIX=$(pwd)/../.. -D CMAKE_BUILD_TYPE=RELEASE -D BUILD_SHARED_LIBS=OFF -D BUILD_TESTS=OFF -D BUILD_PERF_TESTS=OFF -D BUILD_EXAMPLES=OFF -D BUILD_opencv_gpu=OFF -D BUILD_opencv_python=OFF -D BUILD_opencv_java=OFF -D BUILD_opencv_ml=OFF -D BUILD_opencv_contrib=OFF -D BUILD_opencv_ocl=OFF -D BUILD_opencv_legacy=OFF -D BUILD_opencv_nonfree=OFF -D BUILD_opencv_photo=OFF -D BUILD_opencv_video=OFF -D BUILD_opencv_stitching=OFF -D BUILD_opencv_superres=OFF -D BUILD_opencv_photo=OFF -D BUILD_opencv_objdetect=OFF -D BUILD_opencv_features2d=OFF -D BUILD_opencv_calib3d=OFF -D WITH_CUDA=OFF ..
make -j4 && make install
cd ../..

This results in immediate compile errors in the fastfv and fastvideofeat Makefiles.

This can be fixed by updating the dependencies script:

cd opencv-2.4.9
mkdir build && cd build
cmake -D CMAKE_INSTALL_PREFIX=$(pwd)/../.. -D CMAKE_BUILD_TYPE=RELEASE \
    -D BUILD_SHARED_LIBS=OFF -D BUILD_TESTS=OFF -D BUILD_PERF_TESTS=OFF -D BUILD_EXAMPLES=OFF -D BUILD_opencv_gpu=OFF -D BUILD_opencv_python=OFF -D BUILD_opencv_java=OFF -D BUILD_opencv_contrib=OFF -D BUILD_opencv_ocl=OFF -D BUILD_opencv_nonfree=OFF -D BUILD_opencv_photo=OFF -D BUILD_opencv_stitching=OFF -D BUILD_opencv_superres=OFF -D BUILD_opencv_photo=OFF -D WITH_CUDA=OFF \
    -D BUILD_opencv_ml=ON -D BUILD_opencv_legacy=ON -D BUILD_opencv_video=ON -D BUILD_opencv_objdetect=ON -D BUILD_opencv_features2d=ON -D BUILD_opencv_calib3d=ON \
    ..
make -j4 && make install
cd ../..

Note that -D BUILD_opencv_ml=ON because legacy depends on ml.

MV vs RGB

hi Vadim,

I want to understand how motion vectors are important features as compare to RGB information in video frames or what is complementary in MV as compare to RGB. Can you guide me how can I do this?

Segmentation fault

I am running into segmentation faults when I run the code on both Windows and Unix environments. Here is the output of a sample run:

xxxxxxxx$ ./fastvideofeat -i /Users/xxxxxx/DB/UCF11_updated_mpg/basketball/v_shooting_02/v_shooting_02_03.mpg > descriptors.txt
Options:
Input video: /Users/xxxxxx/DB/UCF11_updated_mpg/basketball/v_shooting_02/v_shooting_02_03.mpg
HOG's enabled: yes
HOF's enabled: yes
MBH's enabled: yes
Dense-dense-revolution: no
Interpolation: yes
Good PTS:
Frame count: 144
Original frame size: 320x240
Downsampled: 20x15
After interpolation: 39x29
CellSize: 8

read frame pts=45000

skipping

Frame number = 2

Segmentation fault: 11

The segmentation fault appears to be originating from the frame_reader.h : line 359
bool good = USES_LIST(pFrame->mb_type[mb_index], direction);

Thank you for providing the code. Can you please help me resolve this issue.

Makefile needs to respect positional -L flags for systems that have OpenCV installed globally

On systems that do not have OpenCV installed globally, the current placement of the -L flags is fine. However, in the globally-installed case, the each -L flag needs to immediately preceed the -l flags that reference it. Otherwise, the compiler will find the intended libraries on the system root rather than the depenedenices directory as intended.

This worked for me:

CFLAGS = -O3 -msse3 -std=c++0x
OPENCV_FLAGS = -I../../bin/dependencies/include -L../../bin/dependencies/lib -lopencv_flann -lopencv_core  -Wl,-Bdynamic -lpthread -lz -lm -lc -lrt
YAEL_FLAGS = -I../../bin/dependencies/yael -L../../bin/dependencies/yael/yael -lyael
LDFLAGS = -Wl,-Bstatic -lgomp -llapack -lf77blas -lcblas -latlas -lgfortran
BIN = ../../bin/fastfv

all: $(SOURCE_FILES)
    g++ main.cpp -o $(BIN) $(CFLAGS) $(LDFLAGS) $(OPENCV_FLAGS) $(YAEL_FLAGS)
clean:
    rm $(BIN)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.