Code Monkey home page Code Monkey logo

action-recognition-visual-attention's People

Contributors

kracwarlock avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

action-recognition-visual-attention's Issues

ucf11 evaluation problem

image

After update 100, the model begin to predict, while as shown above, the prediction has been running for over 24 hours without progress to new model update, is that normal? If not, what maybe the main cause of this problem? I'm using your code of issue #6 to do the data preprocessing, with train/validate/test 6:2:2, thanks!

Results reproducibility

Hi @kracwarlock! Thank you for sharing your code of your amazing paper! In order to reproduce your published results, I was wondering how to select the validation split for HMDB-51 and Hollywood2 datasets. Referring to the latter ones, can you please share your files valid_labels.txt, train_labels.txt, test_labels.txt, train_filenames.txt, test_filenames.txt and valid_filenames.txt for that two datasets? I will really appreciate it a lot :) :) :)

How the soft attention model be implemented in this project?

Hi, @kracwarlock. I am so confused about the implementation of the soft attention model. Why the codes related to alpha (pstate & pctx) in the lstm_cond_layer function are different from the equation (4) in your paper? Hope you can give more explanations about how the weights mappingW_i to the ith element of the location softmax be implemented in this project. Thanks a lot.

Your time for one 128-batch?

Hi @kracwarlock ,

This is my first time to train a net using Theano. I wonder if my setup was wrong when it takes so long even it prints out that my GPU is used. I train networks in Caffe it is much faster. Do you remember roughly how many seconds does it take for one 128-image batch in your training? It takes me about 60 second for 1 batch.

Thank you.

Multi-layer LSTM

Hi @kracwarlock, sorry to bother you again. I open here another issue related to you comment

I see that I did not release the multi-layer LSTM code. I will try to do that as soon as I have time. Till then this is how it is done https://github.com/kelvinxu/arctic-captions/blob/master/capgen.py#L542-L548. In the paper the X means the feature of a single sample. In the code everything is done on a batch.

I tried the way you suggested, but soon realized that it cannot work: in fact, by simply replicating the lstm_cond_layer, you also replicate the theano.scan which iterates over the n_steps. It seems to me that this prevents the upper lstm layer to provide the location sofmax to the lower one at each time step. I reckon the multiple layers should be implemented inside the single theano.scan instance.

Could you please either comment on that or provide the original code of the multilayer LSTM?
Thanks in advance!

Can anyone share the code mentioned in the closed issue #6 which is used to combine features to h5 format?

I am interested in combination of deep learning and attention mechanism. I think this is a good way to know what deep learning focus on. I have got the features extracted by convolution layer of GoogLeNet, but the second link in issue #6 is not available (To combine the individual files generated by this script that he sent me I used https://gist.github.com/kracwarlock/96499936487d6125dd010319669c6648).
Can anyone share this code again?
Thanks very much!

1

1

Input data

Hello @kracwarlock
I have two questions about the input data:

  1. When training for Hollywood2, I get a memory error "Cannot allocate memory" before the training loop starts, probably due to the big amount of data. In the pre-processing, I sparsify the features, and my training feature file is about 10 GB. How big is this file for you?
  2. If I use a subset of the dataset (to avoid the above memory problem), the training starts fine but at Epoch 0 Update 1240 an error occurs: "NaN detected in cost".

Thanks!

Initialization of LSTM layers

How did you initialize the cell state and the hidden state of the LSTM layers?
You gave an equation but didn't explain much. I wonder what the f_init function is. I read the code and guess it is a tanh function. How did you do that separately for the 3 layers? And I don't know what the X meant. Is it the feature of a single sample or a batch?

How to get the h5 features file,how to create h5 features file

dataspace = H5S.create_simple( frames,[7 7 1024],{'H5S_UNLIMITED' 'H5S_UNLIMITED'} );
fid = H5F.create('train_features.h5','H5F_ACC_TRUNC', 'H5P_DEFAULT', 'H5P_DEFAULT');
dataset = H5D.create( fid,'features',H5T_IEEE_F32LE,dataspace);it is not to work by python,how to create h5 features file,

Possible bug on data_handler : Reset()

Hi @kracwarlock!
I'd like to check with you the following:

  • On data_handler.py, the Reset() function shuffles self.frame_indices_ and self.labels_
  • Then on GetBatch(), it retrieves some values from these arrays to build the batch :
    start = self.frame_indices_[self.frame_row_]
    label = self.labels_[self.frame_row_]
    length = self.lengths_[self.frame_row_]
  • And based on the length, it decides which features to include:
    if length >= self.seq_length_ * skip: ...
    else:

Shouldn't self.lengths_ be shuffled in Reset() as the other two arrays?

Thank you!
Nuno Garcia

Location softmax

Hi, thanks for sharing the code!
It looks like the location softmax implemented in the conditional lstm is not the one you describe in the paper 'Action Recognition with Visual Attention' (eq. 4), but rather the one described in eqns. 4,5,6,9 in 'Show, Attend and Tell: Neural Image Caption Generation with Visual Attention'. Could you please comment on that? Really appreciate, thanks!

Python Scripts for CNN Encoding Generation

Hi,

I am using tensorflow in python to generate CNN encodings for the video sequences provided by UCF101 dataset. I am dumping the outputs sequentially in a HDF5 file. My code is currently taking 20s per video file to store it's data in HDF5 file. So, for 9500 videos sequences, it will take a lot of time.

Can someone share their experience in it?

How to get the h5 features file

Hello ,after extracting the features in Matrix form, I tried to convert it to HDF5 file by Matlab.
But I don't know whether did I get the right format.
Does this look right?

dataspace = H5S.create_simple( frames,[7 7 1024],{'H5S_UNLIMITED' 'H5S_UNLIMITED'} );
fid = H5F.create('train_features.h5','H5F_ACC_TRUNC', 'H5P_DEFAULT', 'H5P_DEFAULT');
dataset = H5D.create( fid,'features',H5T_IEEE_F32LE,dataspace);

which frames is read from the train_framenum.txt
Or would you plz add some files about how to get the features file? THANK YOU!

Enquiries regarding Data Preprocessing

Thanks for make this interesting project open-source. I am trying to replicate the work discussed in the paper. However, the training procedure for Hollywood2 and UCF11 data sets does not converge. I suspect that something is wrong with the extracted features.

  • I use Python interface of Caffe to extract the features from layer "inception_5b/output" of GoogLeNet. The shape of the features is (1024, 7, 7). According to other forum posts, the shape should be (7, 7, 1024). So I have swapped the axes of the features accordingly. Is that the difference between MATLAB interface and Python interface?
  • Among the 1024 feature maps, appropriately 35% of them only consist of zeros. Is it normal?
  • In the Matlab script, how do you define the name of the feature layer that you intend to use, such as "inception_5b/output"? The script simply uses scores = caffe('forward', {input_data{i}});.

Any help would be greatly appreciated :-)

How to get the dataset?

How did you require the dataset? just like train_features.h5, train_framenum.txt and train_labels.txt, etc.
self.data_file = '/ais/gobi3/u/shikhar/ucf11/dataset/train_features.h5' self.num_frames_file = '/ais/gobi3/u/shikhar/ucf11/dataset/train_framenum.txt' self.labels_file = '/ais/gobi3/u/shikhar/ucf11/dataset/train_labels.txt' self.vid_name_file = '/ais/gobi3/u/shikhar/ucf11/dataset/train_filename.txt' self.dataset_name = 'features'
I don't know how to get these? Download from web or generate these yourself? Could you tell me the method. Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.