google-deepmind / kinetics-i3d Goto Github PK
View Code? Open in Web Editor NEWConvolutional neural network model for video classification trained on the Kinetics dataset.
License: Apache License 2.0
Convolutional neural network model for video classification trained on the Kinetics dataset.
License: Apache License 2.0
The provided TensorFlow model specification for the i3d network includes dropout right before the last Unit3d layer. May I ask which value has been used for this dropout during the training on Kinetics and during the fine-tuning experiments?
I couldn't find it in the paper (but maybe its there and I am just blind).
Could you please provide the code which loads 2D parameters (pretrained on imagenet) to I3D model? (especially the processing of BN/GN).
Thank you very much!
How can I train my own model?
We try to train i3d model on ucf101 from scratch, but it converges much slower with a final validation accuray around 60%. Can you offer some suggestions on train i3d model without imagenet pretrained model.
Hi, thank you for sharing this great work.
Would you give more details about how the IDT is used to improve the results? Which library you used to calculate? How the merge with the pseudo features is done?
According to your paper, when testing, I should send the whole video to the architecture.
When training, the network will produce a tensor of size: B x 7 x 1 x 1 x 400, and we average along the temporal dimension and squeeze to get a probability of size : B x 400
When testing, do we just simply send the whole video to the network and averaging over the temporal dimension?
Hope you can give me the correct method. Thanks for your job
I am trying to extract features from the last global average pooling layer. but the final tensor after
net = tf.nn.avg_pool3d(net, ksize=[1, 2, 7, 7, 1],
strides=[1, 1, 1, 1, 1], padding=snt.VALID)
is of size (1, 6, 1, 1, 1024)
Is there a meaning in that ? am I doing something wrong ? I was hoping for only 1 feature vector of size 1x1024
not 6 of them
Hello! I trained I3D model on my own dataset, 2 classes, each about 50 videos, the two classes are similar, like open the door/ close the door, after 40 epochs, train_accuracy is 90+%,but the val accuracy is just 50%, the model didn't learn anything useful ! How could I do?
Hello,
Where can I get the .pb tensorflow file or is there anyway to convert the .ckpt file to .pb?
Thanks,
do I need to normalize RGB images to [0,1] from [0,255] before calculate optical flow?
I will follow the code here. https://github.com/opencv/opencv/blob/master/samples/gpu/optical_flow.cpp
And 'd_flow' is only output of TVL1.
how to split dx_flow and dy_flow?
Thank you for this wonderful job!!!
Will you release the fine-tuned model on UCF101 and HMDB51???
In the paper it says:
"Models were trained for up to 5k steps on UCF-101 and HMDB-51"
Dear Editor,
I want to know where to get the pre-trained model on kinetics dataset???
Hi,
In the paper "Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset", it is said you used 64 GPUs to train 3D ConvNets.
But in readme, you said minibatch size is 6.
Does it mean that during multi-gpu training, the batch size is 6*64 = 384 ?
Thanks!
Hello,
In order to reproduce the results, l would like to get the link to the code used to generate optical flow .
Thank you
Hello ,
Dear Sir/Madam,
Can the pre-trained model run in a PC of i7 and GTX1080 for fine-tuning? The size of the new dataset will be about 100.
Thanks.
Regards,
Stephen
Hi,
I wonder that whether the image color you used is RGB or BGR ?
The opencv and PIL in python use different color formats and I'm not sure how much this will cause
Any help will be appreciated.
Hi, thank you for your great work. Could you provide the I3D model of CAFFE version?
In the sample code,the example video has been preprocessed, with RGB and Flow NumPy arrays provided.
I want to test my own video, so I consider it might be a way to generate my own Numpy arrays and replace the example ones.
For RGB, The provided *.npy file has shape (1, num_frames, 224, 224, 3). It seems that 'num_frames' means number of frames, '224,224' means heights and widths, ‘3’ means channel(RGB). I'm coufused about ‘1’, what does this mean?And its value?
By the way, what's the equation of norm of the logits tensor?
Hi,
I wonder why you only release checkpoint on Kinetics-600 trained from scratch but not from ImageNet pre-trained parameters.
In the paper, performance is better with ImageNet pre-training on Kinetics-400 dataset.
Is it better to train from scratch on Kinetics-600 dataset?
Thanks!
I noticed that in Conv3D or BatchNorm modules, the default regularizers are None.
Do you use regularizers in Conv3d or BatchNorm when training? If so, do you use L1 or L2 regularizers?
Hi.
I'm only allowed to use the depth data and not the RGB (due to privacy issues). Could you please tell me if I can still use kinetics-i3d?
Thanks,
Sanaz
I'm trying to do sign language recognition in running time so I'm wondering if this model the right choice to take here, and I'm wonder what kind of GPUs are necessary to train such a model.
thanks.
Hi, I am having difficulties running the code because of library issues. I am current using Tensorflow-gpu 0.11.0, Tensorflow-probability-gpu 0.4.0 and Sonnet 1.29. Can anybody help me with the combination of Tensorflow-gpu, Tensorflow-probability-gpu and Sonnet versions that work you when running the code? Thanks.
Can someone help me how to train kinetics-i3d on my own custom dataset?
Thanks much for the great work.
I found there are about 10% of missing data in the train set of Kinetics-400.
Is it consistent with your findings, or should I look into improving my download scripts ? :)
Thanks.
I think the current public code is a part of the whole project. Will the whole project include standard training and testing code be released in the future?
Hi
Would you please tell me the accuracy of the 3D Inception v1 Model which is trained form the scratch on UCF101 Data Set? I mean that the model which isn't Inflated before via Kinetics pre-trained weights.
The readme says to scale the RGB values between -1 and 1. Does this mean x/128.0-1.0
, where x is an uint8 image?
I'm more used to seeing normalizing images with mean and std, so I want to make sure.
Is there a way to generate the .npy files for different videos other than the standard datasets existing?
We want to use i3d on ucf101, How can we use i3d model to fine-turn on ucf101?
Hi ,
When I Finetune the pretrained model on UCF101, I adapt the evaluate_sampe.py
, only use rgb' input, change the
_NUM_CLASSES` to 101, add loss and optimizer after the logits, feed the training data and label to net, but I encounter the error messages:
2017-09-02 15:28:24.771133: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-02 15:28:24.771169: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-02 15:28:24.771190: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-09-02 15:28:24.771194: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-02 15:28:24.771198: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-09-02 15:28:25.113985: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties:
name: GeForce GTX 1080 Ti
major: 6 minor: 1 memoryClockRate (GHz) 1.582
pciBusID 0000:84:00.0
Total memory: 10.91GiB
Free memory: 2.11GiB
2017-09-02 15:28:25.114035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0
2017-09-02 15:28:25.114043: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0: Y
2017-09-02 15:28:25.114067: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:84:00.0)
INFO:tensorflow:Restoring parameters from data/checkpoints/rgb_scratch/model.ckpt
Traceback (most recent call last):
File "i3d_finetune_ucf101.py", line 175, in <module>
tf.app.run(main)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "i3d_finetune_ucf101.py", line 133, in main
feed_dict={rgb_input:batch_xs, rgb_y: batch_ys})
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1100, in _run
% (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (8, 101) for Tensor u'Placeholder_1:0', which has shape '(?, 400)'
Here is my python file:
# Copyright 2017 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""Loads a sample video and classifies using a trained Kinetics checkpoint."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import tensorflow as tf
import i3d
from dataset import Dataset
batch_size = 8
training_iter = 1000
learning_rate = 0.001
_IMAGE_SIZE = 227
_NUM_CLASSES = 101
_SAMPLE_VIDEO_FRAMES = 79
_SAMPLE_PATHS = {
'rgb': 'data/v_CricketShot_g04_c01_rgb.npy',
'flow': 'data/v_CricketShot_g04_c01_flow.npy',
}
_CHECKPOINT_PATHS = {
'rgb': 'data/checkpoints/rgb_scratch/model.ckpt',
'flow': 'data/checkpoints/flow_scratch/model.ckpt',
'rgb_imagenet': 'data/checkpoints/rgb_imagenet/model.ckpt',
'flow_imagenet': 'data/checkpoints/flow_imagenet/model.ckpt',
}
_LABEL_MAP_PATH = 'data/label_map.txt'
FLAGS = tf.flags.FLAGS
tf.flags.DEFINE_string('eval_type', 'rgb', 'rgb, flow, or joint')
tf.flags.DEFINE_boolean('imagenet_pretrained', True, '')
def main(unused_argv):
tf.logging.set_verbosity(tf.logging.INFO)
eval_type = FLAGS.eval_type
imagenet_pretrained = FLAGS.imagenet_pretrained
if eval_type not in ['rgb', 'flow', 'joint']:
raise ValueError('Bad `eval_type`, must be one of rgb, flow, joint')
kinetics_classes = [x.strip() for x in open(_LABEL_MAP_PATH)]
if eval_type in ['rgb', 'joint']:
# RGB input has 3 channels.
rgb_input = tf.placeholder(
tf.float32,
shape=(batch_size, 10, _IMAGE_SIZE, _IMAGE_SIZE, 3))
rgb_y = tf.placeholder(tf.float32, [None, _NUM_CLASSES])
with tf.variable_scope('RGB'):
rgb_model = i3d.InceptionI3d(
_NUM_CLASSES, spatial_squeeze=False, final_endpoint='Logits')
rgb_logits, _ = rgb_model(
rgb_input, is_training=True, dropout_keep_prob=1.0)
rgb_variable_map = {}
for variable in tf.global_variables():
if variable.name.split('/')[0] == 'RGB':
rgb_variable_map[variable.name.replace(':0', '')] = variable
print('===variable:', variable)
rgb_saver = tf.train.Saver(var_list=rgb_variable_map, reshape=True)
# print('=====variables', rgb_variable_map)
if eval_type in ['flow', 'joint']:
# Flow input has only 2 channels.
flow_input = tf.placeholder(
tf.float32,
shape=(1, _SAMPLE_VIDEO_FRAMES, _IMAGE_SIZE, _IMAGE_SIZE, 2))
with tf.variable_scope('Flow'):
flow_model = i3d.InceptionI3d(
_NUM_CLASSES, spatial_squeeze=True, final_endpoint='Logits')
flow_logits, _ = flow_model(
flow_input, is_training=False, dropout_keep_prob=1.0)
flow_variable_map = {}
for variable in tf.global_variables():
if variable.name.split('/')[0] == 'Flow':
flow_variable_map[variable.name.replace(':0', '')] = variable
flow_saver = tf.train.Saver(var_list=flow_variable_map, reshape=True)
if eval_type == 'rgb':
model_logits = rgb_logits
elif eval_type == 'flow':
model_logits = flow_logits
else:
model_logits = rgb_logits + flow_logits
model_predictions = tf.nn.softmax(model_logits)
print( '===model_predictions.shape:', model_predictions.shape)
model_predictions = tf.reduce_mean(model_predictions, (1,2))
print( '===model_predictions.shape:', model_predictions.shape)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=model_predictions, labels=rgb_y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(loss)
dataset = Dataset('data/rgb_train_split1.txt', 'data/rgb_test_split1.txt')
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
with tf.Session(config=config) as sess:
step = 1
while step < training_iter:
batch_xs, batch_ys = dataset.next_batch(batch_size, 'train')
rgb_saver.restore(sess, _CHECKPOINT_PATHS['rgb'])
sess.run(
optimizer,
feed_dict={rgb_input:batch_xs, rgb_y: batch_ys})
if __name__ == '__main__':
tf.app.run(main)
Hi, May I know what parameters are you used for finetuning on UCF101 and JHMDB? Like learning rate, momentum, weight_decay, and steps. Thanks
Hi,
How to generate optical flows using GPUs? Seems this is being difficult to achieve using Python. Can you help with this ?
Thank you
I have read about other's people problematic experiences with OpenCV's TV-L1 optical flow algorithm about how time consuming it is and I have also witnessed it myself first-hand.
My question is simple, is it legitimate to choose another Optical Flow Estimation way, for instance the LiteFlowNet (http://mmlab.ie.cuhk.edu.hk/projects/LiteFlowNet/) which is currently the state-of-the-art at CVPR 2018. will this affect the results, especially when I intend to use the Flow Kinetics-i3d model solely for feature extraction purposes ?
Thanks in advance.
Hello @derpson
I'm doing research regarding action recognition.
I just downloaded UCF 101 dataset for action Recognition
But I need MetaData and Json Files and Description Files.
If any one can help us please forward the files.
Can any one please help me in Data Augmentation for this Dataset.
Do you plan to add trained checkpoints for the optical flow stream from the Kinetics-600 dataset? Especially, imagenet + kinetics-600?
Hi all,
I want to use the pre-trained model to process several video files, but I don't want to classify them. I only want to extract the properties of the first N layers (2-3 layers) to see if there are differences between the different video files (they are very similar).
After the prediction function, how can I extract the different outputs of the first layers?
Thank you in advance.
Hello,
Thank you for this work.
Sorry for my beginner question. I'm not used to tensorflow.
l would like to test your model on thousands of samples. How can l adapt evaluate_sample.py for multiple samples https://github.com/deepmind/kinetics-i3d/blob/master/evaluate_sample.py#L29
Thank you a lot
The Inception_v1 have 3x3 and 5x5 convolution layer and the bn_inception have 3x3 and two 3x3 conv layer in the middle branch in each inception module. But it the 3D inception module, you only save 2 3x3 conv layer, one for each branch. Can you tell me why you do this and when you transfer 2D bn-inception parameters to 3D model, do you just ignore the second 3x3 conv layer in the second branch?
Thanks!
Hi,
Due to the nature of TV-L1 optical flow algorithm, it is quite time consuming to process (but I have more than a 100k videos I must process which makes it quite frustrating to watch),
Thank you,
Hi,
Could anybody please share the hyperparameters for fintuning UCF101 and HMDB? There is a huge gap in my experiments.
Thanks a lot!
Xudong
Hi, I would like to know how to preprocess the kinetics-400 for reproducing the results. I found that extracting tvl1 flow before rescale the rgb images leads to worse flow recognition accuracy.
So, currently, I first resampling videos at 25 fps. Then I extract rgb frames and resize with shorter side setting 256 pixels. I am using opencv3.4 version of cv::cuda::OpticalFlowDual_TVL1 for flow extraction on the resize gray-scale frames. All the pixels values are rescale as mention in the project. Are there any details i am missing in this preprossing procedure? Or, am I conducting the right way for extracting optical flow? Thanks.
Hi, I have a question regarding the explanation of the optical flow used. The git page states,
We only use the first two output dimensions, and apply the same cropping as for RGB. The provided .npy file thus has shape (1, num_frames, 224, 224, 2)
However, I was wondering what this is referring to exactly. Is this the stack of u
and v
, the output of the TVL1?(if that is the case, just wondering in what order?) Or do you make it into a rgb image and use just the rg ?
This was a little unclear for me, thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.