Code Monkey home page Code Monkey logo

Comments (7)

mourad1081 avatar mourad1081 commented on September 24, 2024 1

@Anirudh58 and @rishabh2301, I added the max_pool3d(...) line that estathop mentionned here to get the (1, 1, 1, 1, 1024) vector:
(in i3d.py)

...
end_point = 'Logits'
with tf.variable_scope(end_point):
    net = tf.nn.avg_pool3d(net, ksize=[1, 2, 7, 7, 1], strides=[1, 1, 1, 1, 1], padding=snt.VALID)
    # To get the features, I add this line because I have a vector (1, x, 1, 1, 1024) otherwise..
    # cf. https://github.com/deepmind/kinetics-i3d/issues/40
    net = tf.nn.max_pool3d(net, ksize=[1, net.get_shape()[1], 1, 1, 1], strides=[1, 1, 1, 1, 1], padding=snt.VALID)
    ...

from kinetics-i3d.

joaoluiscarreira avatar joaoluiscarreira commented on September 24, 2024

from kinetics-i3d.

estathop avatar estathop commented on September 24, 2024

@joaoluiscarreira but my input is a numpy array of size (1,53,224,244,2). The number of frames is 53, thus the video is small. And the resolution is fixed as in (224,224). The paper I am trying to implement said:

The motion features are computed using a Kinetics pre-trained I3D flow network [12].
We extract the 1024-dimensional features from the last global average pooling layer
I was wondering if I remove the

logits = Unit3D(output_channels=self._num_classes,
                      kernel_shape=[1, 1, 1],
                      activation_fn=None,
                      use_batch_norm=False,
                      use_bias=True,
                      name='Conv3d_0c_1x1')(net, is_training=is_training)

and from

net = tf.nn.avg_pool3d(net, ksize=[1, 2, 7, 7, 1],
                            strides=[1, 1, 1, 1, 1], padding=snt.VALID)
net = tf.nn.dropout(net, dropout_keep_prob)

if I move directly to :

if self._spatial_squeeze:
        logits = tf.squeeze(logits, [2, 3], name='SpatialSqueeze')
    averaged_logits = tf.reduce_mean(logits, axis=1)

will that be the 1x1024 aforementioned feature vector the author implied ?

from kinetics-i3d.

joaoluiscarreira avatar joaoluiscarreira commented on September 24, 2024

I'm not entirely sure about what they do in the paper but if you just want a vector then yes, averaging them in time should do the job. Easiest would be to directly do tf.reduce_mean(net, axis=[1,2,3]) and skip the avg_pool, dropout, etc.

from kinetics-i3d.

estathop avatar estathop commented on September 24, 2024

it was more of a tensorflow question but the second dimension of that tensor augmented by how big the video was, it was non-linearly proportionate to time , so you don't actually care, you perform a global max pooling with
net = tf.nn.max_pool3d(net, ksize =[1, net.get_shape()[1] , 1 ,1 ,1] , strides = [1, 1, 1, 1, 1], padding = snt.VALID)
and you have the temporally aggregated feature vector you need

from kinetics-i3d.

rishabh2301 avatar rishabh2301 commented on September 24, 2024

Hi @estathop I am working on a similar problem to extract features using the kinetics pretrained I3d model . I want features for the RGB frames only for now. Can you help me with how did you proceed to use this code for feature extraction ? Thank you very much.

from kinetics-i3d.

Anirudh58 avatar Anirudh58 commented on September 24, 2024

hello @rishabh2301 I am working on a similar problem statement. Did you figure out how to use this code to extract features?

from kinetics-i3d.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.