Comments (7)
@Anirudh58 and @rishabh2301, I added the max_pool3d(...)
line that estathop mentionned here to get the (1, 1, 1, 1, 1024) vector:
(in i3d.py)
...
end_point = 'Logits'
with tf.variable_scope(end_point):
net = tf.nn.avg_pool3d(net, ksize=[1, 2, 7, 7, 1], strides=[1, 1, 1, 1, 1], padding=snt.VALID)
# To get the features, I add this line because I have a vector (1, x, 1, 1, 1024) otherwise..
# cf. https://github.com/deepmind/kinetics-i3d/issues/40
net = tf.nn.max_pool3d(net, ksize=[1, net.get_shape()[1], 1, 1, 1], strides=[1, 1, 1, 1, 1], padding=snt.VALID)
...
from kinetics-i3d.
from kinetics-i3d.
@joaoluiscarreira but my input is a numpy array of size (1,53,224,244,2). The number of frames is 53, thus the video is small. And the resolution is fixed as in (224,224). The paper I am trying to implement said:
The motion features are computed using a Kinetics pre-trained I3D flow network [12].
We extract the 1024-dimensional features from the last global average pooling layer
I was wondering if I remove the
logits = Unit3D(output_channels=self._num_classes,
kernel_shape=[1, 1, 1],
activation_fn=None,
use_batch_norm=False,
use_bias=True,
name='Conv3d_0c_1x1')(net, is_training=is_training)
and from
net = tf.nn.avg_pool3d(net, ksize=[1, 2, 7, 7, 1],
strides=[1, 1, 1, 1, 1], padding=snt.VALID)
net = tf.nn.dropout(net, dropout_keep_prob)
if I move directly to :
if self._spatial_squeeze:
logits = tf.squeeze(logits, [2, 3], name='SpatialSqueeze')
averaged_logits = tf.reduce_mean(logits, axis=1)
will that be the 1x1024 aforementioned feature vector the author implied ?
from kinetics-i3d.
I'm not entirely sure about what they do in the paper but if you just want a vector then yes, averaging them in time should do the job. Easiest would be to directly do tf.reduce_mean(net, axis=[1,2,3]) and skip the avg_pool, dropout, etc.
from kinetics-i3d.
it was more of a tensorflow question but the second dimension of that tensor augmented by how big the video was, it was non-linearly proportionate to time , so you don't actually care, you perform a global max pooling with
net = tf.nn.max_pool3d(net, ksize =[1, net.get_shape()[1] , 1 ,1 ,1] , strides = [1, 1, 1, 1, 1], padding = snt.VALID)
and you have the temporally aggregated feature vector you need
from kinetics-i3d.
Hi @estathop I am working on a similar problem to extract features using the kinetics pretrained I3d model . I want features for the RGB frames only for now. Can you help me with how did you proceed to use this code for feature extraction ? Thank you very much.
from kinetics-i3d.
hello @rishabh2301 I am working on a similar problem statement. Did you figure out how to use this code to extract features?
from kinetics-i3d.
Related Issues (20)
- Optical flow rescaling HOT 5
- How to create a custom action recognition model HOT 2
- Inflating pre-trained models HOT 1
- Trying to create frozen graph, so i can convert it into tflite for android
- Training with different architectures
- customize actions class in the model HOT 1
- offline usage
- rgb.npy and flow.npy HOT 3
- training from scratch HOT 2
- which one is the checkpoint? HOT 5
- The problem of receptive field in I3D paper
- missing videos
- Does the video need to be cropped? HOT 11
- Calculation of TV L1 flow HOT 1
- dependencies issues HOT 1
- Struggling to learn using Opt. Flow HOT 2
- Incompatibility issues
- Run time of I3D on edge decives
- Is there Model File(.pth or .pt) that pretrained with Imagenet+Kinetics?
- I found the pth file which pretrained on Kinetics400 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kinetics-i3d.