google-deepmind / kinetics-i3d Goto Github PK

Convolutional neural network model for video classification trained on the Kinetics dataset.

License: Apache License 2.0

Python 95.99% Shell 4.01%

kinetics-i3d's Issues

Value for dropout?

The provided TensorFlow model specification for the i3d network includes dropout right before the last Unit3d layer. May I ask which value has been used for this dropout during the training on Kinetics and during the fine-tuning experiments?

I couldn't find it in the paper (but maybe its there and I am just blind).

What is the nature of deep convnet used in i3d, is it a vgg16 ?

code to load 2D parameters (pretrained on imagenet) to 3D model

Could you please provide the code which loads 2D parameters (pretrained on imagenet) to I3D model？ (especially the processing of BN/GN).

Thank you very much!

How can I train my own model?

train from scratch on ucf101 dataset

We try to train i3d model on ucf101 from scratch, but it converges much slower with a final validation accuray around 60%. Can you offer some suggestions on train i3d model without imagenet pretrained model.

IDT Handcrafted Features

Hi, thank you for sharing this great work.

Would you give more details about how the IDT is used to improve the results? Which library you used to calculate? How the merge with the pseudo features is done?

Where can l dowload the pre-trained model on kinetics dataset?

How to test using the whole videos?

According to your paper, when testing, I should send the whole video to the architecture.
When training, the network will produce a tensor of size: B x 7 x 1 x 1 x 400, and we average along the temporal dimension and squeeze to get a probability of size : B x 400

When testing, do we just simply send the whole video to the network and averaging over the temporal dimension?

Hope you can give me the correct method. Thanks for your job

Feature Extraction from Last Global Average Pooling Layer

I am trying to extract features from the last global average pooling layer. but the final tensor after

net = tf.nn.avg_pool3d(net, ksize=[1, 2, 7, 7, 1],
                             strides=[1, 1, 1, 1, 1], padding=snt.VALID)

is of size (1, 6, 1, 1, 1024) Is there a meaning in that ? am I doing something wrong ? I was hoping for only 1 feature vector of size 1x1024 not 6 of them

train on my own dataset

Hello! I trained I3D model on my own dataset, 2 classes, each about 50 videos, the two classes are similar, like open the door/ close the door, after 40 epochs, train_accuracy is 90+%,but the val accuracy is just 50%, the model didn't learn anything useful ! How could I do?

where can I get the .pb file for the model

Hello,

Where can I get the .pb tensorflow file or is there anyway to convert the .ckpt file to .pb?

Thanks,

could you release the code to extract optical flow?

do I need to normalize RGB images to [0,1] from [0,255] before calculate optical flow?
I will follow the code here. https://github.com/opencv/opencv/blob/master/samples/gpu/optical_flow.cpp
And 'd_flow' is only output of TVL1.
how to split dx_flow and dy_flow?

fine-tuned model on UCF101 and HMDB51

Thank you for this wonderful job!!!
Will you release the fine-tuned model on UCF101 and HMDB51???

5k steps on UCF-101 => steps in terms of?

In the paper it says:

"Models were trained for up to 5k steps on UCF-101 and HMDB-51"

steps in terms of epochs
or
steps in terms of batches

Where to get the pre-trained model on kinetics dataset?

Dear Editor,
I want to know where to get the pre-trained model on kinetics dataset???

about batch size

Hi,

In the paper "Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset", it is said you used 64 GPUs to train 3D ConvNets.
But in readme, you said minibatch size is 6.
Does it mean that during multi-gpu training, the batch size is 6*64 = 384 ?

Thanks！

What is the code used for optical flow ?

Hello,

In order to reproduce the results, l would like to get the link to the code used to generate optical flow .

Thank you

no

Hello ,

What is the minimum hardware requirement to fine-tune I3D on new dataset?

Dear Sir/Madam,

Can the pre-trained model run in a PC of i7 and GTX1080 for fine-tuning? The size of the new dataset will be about 100.

Thanks.

Regards,
Stephen

RGB or BGR ?

Hi,
I wonder that whether the image color you used is RGB or BGR ?
The opencv and PIL in python use different color formats and I'm not sure how much this will cause
Any help will be appreciated.

Could you provide the I3D model of CAFFE version

Hi, thank you for your great work. Could you provide the I3D model of CAFFE version?

Shape of *.npy file?

In the sample code,the example video has been preprocessed, with RGB and Flow NumPy arrays provided.
I want to test my own video, so I consider it might be a way to generate my own Numpy arrays and replace the example ones.
For RGB, The provided *.npy file has shape (1, num_frames, 224, 224, 3). It seems that 'num_frames' means number of frames, '224,224' means heights and widths, ‘3’ means channel(RGB). I'm coufused about ‘1’, what does this mean?And its value?

By the way, what's the equation of norm of the logits tensor?

is there have pytorch version?

Is it better to train from scratch on Kinetics-600?

Hi,
I wonder why you only release checkpoint on Kinetics-600 trained from scratch but not from ImageNet pre-trained parameters.
In the paper, performance is better with ImageNet pre-training on Kinetics-400 dataset.
Is it better to train from scratch on Kinetics-600 dataset?

Thanks!

What regularizers do you use when training?

I noticed that in Conv3D or BatchNorm modules, the default regularizers are None.
Do you use regularizers in Conv3d or BatchNorm when training? If so, do you use L1 or L2 regularizers?

depth data

Hi.
I'm only allowed to use the depth data and not the RGB (due to privacy issues). Could you please tell me if I can still use kinetics-i3d?

Thanks,
Sanaz

automatic sign language recognition.

I'm trying to do sign language recognition in running time so I'm wondering if this model the right choice to take here, and I'm wonder what kind of GPUs are necessary to train such a model.

thanks.

Dependencies

Hi, I am having difficulties running the code because of library issues. I am current using Tensorflow-gpu 0.11.0, Tensorflow-probability-gpu 0.4.0 and Sonnet 1.29. Can anybody help me with the combination of Tensorflow-gpu, Tensorflow-probability-gpu and Sonnet versions that work you when running the code? Thanks.

Instructions how to train the model?

Can someone help me how to train kinetics-i3d on my own custom dataset?

Missing data in Kinetics 400

Thanks much for the great work.
I found there are about 10% of missing data in the train set of Kinetics-400.
Is it consistent with your findings, or should I look into improving my download scripts ? :)
Thanks.

What is the procedure for fine-tuning the model on another dataset?

Will the training code be released in the future?

I think the current public code is a part of the whole project. Will the whole project include standard training and testing code be released in the future?

3D Inception v1 Model Train from Scratch on UCF101 Data Set

Hi
Would you please tell me the accuracy of the 3D Inception v1 Model which is trained form the scratch on UCF101 Data Set? I mean that the model which isn't Inflated before via Kinetics pre-trained weights.

Image preprocessing

The readme says to scale the RGB values between -1 and 1. Does this mean x/128.0-1.0, where x is an uint8 image?
I'm more used to seeing normalizing images with mean and std, so I want to make sure.

How to generate .npy files for different videos?

Is there a way to generate the .npy files for different videos other than the standard datasets existing?

Can we use i3d on ucf101

We want to use i3d on ucf101, How can we use i3d model to fine-turn on ucf101?

Finetune the pretrained model on UCF101

Hi ,
When I Finetune the pretrained model on UCF101, I adapt the evaluate_sampe.py, only use rgb' input, change the _NUM_CLASSES` to 101, add loss and optimizer after the logits, feed the training data and label to net, but I encounter the error messages:

2017-09-02 15:28:24.771133: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-02 15:28:24.771169: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-02 15:28:24.771190: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-09-02 15:28:24.771194: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-02 15:28:24.771198: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-09-02 15:28:25.113985: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties:
name: GeForce GTX 1080 Ti
major: 6 minor: 1 memoryClockRate (GHz) 1.582
pciBusID 0000:84:00.0
Total memory: 10.91GiB
Free memory: 2.11GiB
2017-09-02 15:28:25.114035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0
2017-09-02 15:28:25.114043: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y
2017-09-02 15:28:25.114067: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:84:00.0)
INFO:tensorflow:Restoring parameters from data/checkpoints/rgb_scratch/model.ckpt
Traceback (most recent call last):
  File "i3d_finetune_ucf101.py", line 175, in <module>
    tf.app.run(main)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "i3d_finetune_ucf101.py", line 133, in main
    feed_dict={rgb_input:batch_xs, rgb_y: batch_ys})
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 895, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1100, in _run
    % (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (8, 101) for Tensor u'Placeholder_1:0', which has shape '(?, 400)'

Here is my python file:

# Copyright 2017 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""Loads a sample video and classifies using a trained Kinetics checkpoint."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import numpy as np
import tensorflow as tf

import i3d
from  dataset import Dataset

batch_size = 8
training_iter = 1000
learning_rate = 0.001

_IMAGE_SIZE = 227
_NUM_CLASSES = 101

_SAMPLE_VIDEO_FRAMES = 79
_SAMPLE_PATHS = {
    'rgb': 'data/v_CricketShot_g04_c01_rgb.npy',
    'flow': 'data/v_CricketShot_g04_c01_flow.npy',
}

_CHECKPOINT_PATHS = {
    'rgb': 'data/checkpoints/rgb_scratch/model.ckpt',
    'flow': 'data/checkpoints/flow_scratch/model.ckpt',
    'rgb_imagenet': 'data/checkpoints/rgb_imagenet/model.ckpt',
    'flow_imagenet': 'data/checkpoints/flow_imagenet/model.ckpt',
}

_LABEL_MAP_PATH = 'data/label_map.txt'

FLAGS = tf.flags.FLAGS

tf.flags.DEFINE_string('eval_type', 'rgb', 'rgb, flow, or joint')
tf.flags.DEFINE_boolean('imagenet_pretrained', True, '')


def main(unused_argv):
  tf.logging.set_verbosity(tf.logging.INFO)
  eval_type = FLAGS.eval_type
  imagenet_pretrained = FLAGS.imagenet_pretrained

  if eval_type not in ['rgb', 'flow', 'joint']:
    raise ValueError('Bad `eval_type`, must be one of rgb, flow, joint')

  kinetics_classes = [x.strip() for x in open(_LABEL_MAP_PATH)]

  if eval_type in ['rgb', 'joint']:
    # RGB input has 3 channels.
    rgb_input = tf.placeholder(
        tf.float32,
        shape=(batch_size, 10, _IMAGE_SIZE, _IMAGE_SIZE, 3))
    rgb_y = tf.placeholder(tf.float32, [None, _NUM_CLASSES])
    with tf.variable_scope('RGB'):
      rgb_model = i3d.InceptionI3d(
          _NUM_CLASSES, spatial_squeeze=False, final_endpoint='Logits')
      rgb_logits, _ = rgb_model(
          rgb_input, is_training=True, dropout_keep_prob=1.0)
    rgb_variable_map = {}
    for variable in tf.global_variables():
      if variable.name.split('/')[0] == 'RGB':
        rgb_variable_map[variable.name.replace(':0', '')] = variable
        print('===variable:', variable)
    rgb_saver = tf.train.Saver(var_list=rgb_variable_map, reshape=True)
#    print('=====variables', rgb_variable_map)

  if eval_type in ['flow', 'joint']:
    # Flow input has only 2 channels.
    flow_input = tf.placeholder(
        tf.float32,
        shape=(1, _SAMPLE_VIDEO_FRAMES, _IMAGE_SIZE, _IMAGE_SIZE, 2))
    with tf.variable_scope('Flow'):
      flow_model = i3d.InceptionI3d(
          _NUM_CLASSES, spatial_squeeze=True, final_endpoint='Logits')
      flow_logits, _ = flow_model(
          flow_input, is_training=False, dropout_keep_prob=1.0)
    flow_variable_map = {}
    for variable in tf.global_variables():
      if variable.name.split('/')[0] == 'Flow':
        flow_variable_map[variable.name.replace(':0', '')] = variable
    flow_saver = tf.train.Saver(var_list=flow_variable_map, reshape=True)

  if eval_type == 'rgb':
    model_logits = rgb_logits
  elif eval_type == 'flow':
    model_logits = flow_logits
  else:
    model_logits = rgb_logits + flow_logits
  model_predictions = tf.nn.softmax(model_logits)
  print( '===model_predictions.shape:', model_predictions.shape)
  model_predictions = tf.reduce_mean(model_predictions, (1,2))
  print( '===model_predictions.shape:', model_predictions.shape)
  loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=model_predictions, labels=rgb_y))
  optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(loss)

  dataset = Dataset('data/rgb_train_split1.txt', 'data/rgb_test_split1.txt')
  config = tf.ConfigProto()
  config.gpu_options.allow_growth = True
  with tf.Session(config=config) as sess:
    step = 1
    while step < training_iter:
      batch_xs, batch_ys = dataset.next_batch(batch_size, 'train')
      rgb_saver.restore(sess, _CHECKPOINT_PATHS['rgb'])
      sess.run(
        optimizer,
        feed_dict={rgb_input:batch_xs, rgb_y: batch_ys})

if __name__ == '__main__':
  tf.app.run(main)

Parameters for finetuning on UCF101 and JHMDB

Hi, May I know what parameters are you used for finetuning on UCF101 and JHMDB? Like learning rate, momentum, weight_decay, and steps. Thanks

Generating Optical Flow

Hi,

How to generate optical flows using GPUs? Seems this is being difficult to achieve using Python. Can you help with this ?

Thank you

LiteFlowNet over OpenCV TV-L1 optical flow algorithm. Can we ?

I have read about other's people problematic experiences with OpenCV's TV-L1 optical flow algorithm about how time consuming it is and I have also witnessed it myself first-hand.
My question is simple, is it legitimate to choose another Optical Flow Estimation way, for instance the LiteFlowNet (http://mmlab.ie.cuhk.edu.hk/projects/LiteFlowNet/) which is currently the state-of-the-art at CVPR 2018. will this affect the results, especially when I intend to use the Flow Kinetics-i3d model solely for feature extraction purposes ?
Thanks in advance.

META DATA FILES OF UCF-101 DATASET FOR ACTION RECOGNITION

Hello @derpson

I'm doing research regarding action recognition.
I just downloaded UCF 101 dataset for action Recognition
But I need MetaData and Json Files and Description Files.
If any one can help us please forward the files.

Can any one please help me in Data Augmentation for this Dataset.

Kinetics-600 checkpoints for optical flow

Do you plan to add trained checkpoints for the optical flow stream from the Kinetics-600 dataset? Especially, imagenet + kinetics-600?

How can I apply the first N layers of the model to a video file?

Hi all,

I want to use the pre-trained model to process several video files, but I don't want to classify them. I only want to extract the properties of the first N layers (2-3 layers) to see if there are differences between the different video files (they are very similar).

After the prediction function, how can I extract the different outputs of the first layers?

Thank you in advance.

How can l adapt evaluate_sample.py for hundred of samples ?

Hello,

Thank you for this work.
Sorry for my beginner question. I'm not used to tensorflow.

l would like to test your model on thousands of samples. How can l adapt evaluate_sample.py for multiple samples https://github.com/deepmind/kinetics-i3d/blob/master/evaluate_sample.py#L29

Thank you a lot

The difference between the 3D Inception Module and the 2D.

The Inception_v1 have 3x3 and 5x5 convolution layer and the bn_inception have 3x3 and two 3x3 conv layer in the middle branch in each inception module. But it the 3D inception module, you only save 2 3x3 conv layer, one for each branch. Can you tell me why you do this and when you transfer 2D bn-inception parameters to 3D model, do you just ignore the second 3x3 conv layer in the second branch?
Thanks!

Regarding TV-L1 Optical Flow

Hi,

Due to the nature of TV-L1 optical flow algorithm, it is quite time consuming to process (but I have more than a 100k videos I must process which makes it quite frustrating to watch),

Are you aware of any codes or methods to speed up the process ? (apart from changing the size of the input)
This may be a no brainer, but how problematic will be to input to the pretrained model a different dense optical flow algorithm produced results such as farneback?

Thank you,

*.npy files include RGB and flow ，which include 79 frames respecitvely.but each rgb frame to a flow frame?

Has anybody reproduced the results on UCF101 and HMDB

Hi,
Could anybody please share the hyperparameters for fintuning UCF101 and HMDB? There is a huge gap in my experiments.
Thanks a lot!
Xudong

Data-preprocessing for kinetics-400

Hi, I would like to know how to preprocess the kinetics-400 for reproducing the results. I found that extracting tvl1 flow before rescale the rgb images leads to worse flow recognition accuracy.
So, currently, I first resampling videos at 25 fps. Then I extract rgb frames and resize with shorter side setting 256 pixels. I am using opencv3.4 version of cv::cuda::OpticalFlowDual_TVL1 for flow extraction on the resize gray-scale frames. All the pixels values are rescale as mention in the project. Are there any details i am missing in this preprossing procedure? Or, am I conducting the right way for extracting optical flow? Thanks.

Regarding the 2 dimensions of the Optical Flow

Hi, I have a question regarding the explanation of the optical flow used. The git page states,

We only use the first two output dimensions, and apply the same cropping as for RGB. The provided .npy file thus has shape (1, num_frames, 224, 224, 2)

However, I was wondering what this is referring to exactly. Is this the stack of u and v, the output of the TVL1?(if that is the case, just wondering in what order?) Or do you make it into a rgb image and use just the rg ?

This was a little unclear for me, thanks.

google-deepmind / kinetics-i3d Goto Github PK

kinetics-i3d's Issues

Recommend Projects

Recommend Topics

Recommend Org