srijandas07 / vpn Goto Github PK

View Code? Open in Web Editor NEW

37.0 37.0 9.0 339 KB

Pose driven attention mechanism

Python 99.61% Shell 0.39%

vpn's People

Contributors

Stargazers

Watchers

Forkers

avijit9 punknownq martinhoang11 ballballu andreaross96 hopeliu20160622 kangyoungseo ling233333 wyz-lxw

vpn's Issues

Requirements

Can you please provide requirements file for the packages and the versions used in this project?

What is the requirement of tensorflow version?

I can not import layer 'Merge' from 'keras.layers' like in your code. I have tried some different versions of tensorflow (v1, v2,...), but it still doesn't work.

Format of input data

Hello,

Can you please elaborate on what should be the format of the input data? Specifically what should be the names of the cropped images and how should they be stored?

Thanks!

Question about discrepancy with the published paper

First of all, thank you for sharing your good research.
For the spatial embedding, the Euclidean distance of the L2 normalized feature is used in the paper.
But the code on repository uses Manhatten distance.
I'd like to know if these changes are intended and what difference the changed metrics make.

Thank you in advance.

requirements.txt

Could you add the requirements.txt to the project? The dependencies are quite complicated for me to handle

Question about the Smarthome dataset

Hi, thank you for your good job. I am using the Smarthome dataset now. Could you tell me the specific 19 activities in CV1 and CV2?

Pretrained models links

Hello, the pretrained models links in the readme are incorrect. Can you please fix them.

Questions about attention network

Thank you for your great work! I have two qusetions as follows:
Q1: According to the code blew, I am not aware of why As or z1 and At or z2 denote sptioal features and temporal features, respectively. From where I stand, sptioal features derive from features through the convolution layer along the channel axis, while temporal features derive from features through the pooling layer. Do I misunderstand it or miss some details?

    z1 = Dense(256, activation='tanh', name='z1_layer', trainable=True)(model_gcnn.get_layer('gcnn_out').output)
    z2 = Dense(128, activation='tanh', name='z2_layer', trainable=True)(model_gcnn.get_layer('gcnn_out').output)

    fc_main_spatial = Dense(49, activity_regularizer=attention_reg, kernel_initializer='zeros', bias_initializer='zeros',
                    activation='sigmoid', trainable=True, name='dense_spatial')(z1)
    fc_main_temporal = Dense(2, activity_regularizer=attention_reg, kernel_initializer='zeros',
                            bias_initializer='zeros',
                            activation='softmax', trainable=True, name='dense_temporal')(z2)

Q2: What's the contribution about 2D CNN layer after the Pose Backbone? How dose it help with the spatioal-temporal coupler?
Looking for your reply! Thank you!

How to test the pre-trained model?

Hi, I am wondering how to test the pre-trained models, especially on the smarthome dataset?

Data Normalization

Hi, I am trying to train NTU with your model but your code does not mention how the poses' x, y, z are normalized. Could you give me your normalization method.

Question about NUCLA dataset

Hi, could you please give me the list of videos for the training set and validation set (cross-view method)? I found that the original NUCLA dataset has 1463 videos, but in your paper, there are only 1194.
Thank you for your help!

About the NW-UCLA Dataset

Hello! Thanks for your job! I want to use NW-UCLA dataset, but the downloaded data from [https://wangjiangb.github.io/my_data.html] can not be unzipped. Could you please tell me how to get this dataset? Thank you a lot!

How can I convert NTU dataset into patches_full_body

Hello there!

Thank you for sharing this great piece of research! I was trying to train the model from scratch, so I downloaded the NTU dataset from the official website. Your code requires the dataset to be in the form of images cropped around the human patches (in the folder "patches_full_body"), could you please share the script that generates these cropped images to make sure that I'm doing everything in the exact same way as the paper? How are the patches generated and how do you deal with multiple people in the scene?

Thank you