Code Monkey home page Code Monkey logo

human-motion-prediction's Introduction

human-motion-prediction

This is the code for the paper

Julieta Martinez, Michael J. Black, Javier Romero. On human motion prediction using recurrent neural networks. In CVPR 17.

It can be found on arxiv as well: https://arxiv.org/pdf/1705.02445.pdf

The code in this repository was written by Julieta Martinez and Javier Romero.

Dependencies

Get this code and the data

First things first, clone this repo and get the human3.6m dataset on exponential map format.

git clone https://github.com/una-dinosauria/human-motion-prediction.git
cd human-motion-prediction
mkdir data
cd data
wget http://www.cs.stanford.edu/people/ashesh/h3.6m.zip
unzip h3.6m.zip
rm h3.6m.zip
cd ..

Quick demo and visualization

For a quick demo, you can train for a few iterations and visualize the outputs of your model.

To train, run

python src/translate.py --action walking --seq_length_out 25 --iterations 10000

To save some samples of the model, run

python src/translate.py --action walking --seq_length_out 25 --iterations 10000 --sample --load 10000

Finally, to visualize the samples run

python src/forward_kinematics.py

This should create a visualization similar to this one



Running average baselines

To reproduce the running average baseline results from our paper, run

python src/baselines.py

RNN models

To train and reproduce the results of our models, use the following commands

model arguments training time (gtx 1080) notes
Sampling-based loss (SA) python src/translate.py --action walking --seq_length_out 25 45s / 1000 iters Realistic long-term motion, loss computed over 1 second.
Residual (SA) python src/translate.py --residual_velocities --action walking 35s / 1000 iters
Residual unsup. (MA) python src/translate.py --residual_velocities --learning_rate 0.005 --omit_one_hot 65s / 1000 iters
Residual sup. (MA) python src/translate.py --residual_velocities --learning_rate 0.005 65s / 1000 iters best quantitative.
Untied python src/translate.py --residual_velocities --learning_rate 0.005 --architecture basic 70s / 1000 iters

You can substitute the --action walking parameter for any action in

["directions", "discussion", "eating", "greeting", "phoning",
 "posing", "purchases", "sitting", "sittingdown", "smoking",
 "takingphoto", "waiting", "walking", "walkingdog", "walkingtogether"]

or --action all (default) to train on all actions.

The code will log the error in Euler angles for each action to tensorboard. You can track the progress during training by typing tensorboard --logdir experiments in the terminal and checking the board under http://127.0.1.1:6006/ in your browser (occasionally, tensorboard might pick another url).

Citing

If you use our code, please cite our work

@inproceedings{julieta2017motion,
  title={On human motion prediction using recurrent neural networks},
  author={Martinez, Julieta and Black, Michael J. and Romero, Javier},
  booktitle={CVPR},
  year={2017}
}

Other implementations

Acknowledgments

The pre-processed human 3.6m dataset and some of our evaluation code (specially under src/data_utils.py) was ported/adapted from SRNN by @asheshjain399.

Licence

MIT

human-motion-prediction's People

Contributors

panispani avatar seleucia avatar una-dinosauria avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

human-motion-prediction's Issues

Figure3

@una-dinosauria Is it possible to have a code corresponding to this figure? I would appreciate you if you provide that for further comparisons as a reference

A question on Encoder cells and Decoder cells

Hi @una-dinosauria,

I have a question on modeling Encoder cells and Decoder cells which still I was unable to solve. Let me explain it like this.

When the seq2seq model is created, first it will create the GRUcell like below.

cell = tf.contrib.rnn.GRUCell( self.rnn_size )

Then it will add the Linear Space decorder to the GRU cell like below.

cell = rnn_cell_extensions.LinearSpaceDecoderWrapper( cell, self.input_size )

After the it will add the redual wrapper to model velocity as below.

cell = rnn_cell_extensions.ResidualWrapper( cell )

Finally that cell is used to model full seq2seq model like below.

outputs, self.states = tf.contrib.legacy_seq2seq.tied_rnn_seq2seq( enc_in, dec_in, cell, loop_function=lf )

According to the below image, the cell which is wrapped by Linear Space decorder wrapper and redual wrapper is only used in decorder and in the encorder we have a simple cell.
screenshot from 2018-10-02 07-09-35

My problem is, according to this implimentation do we expect to run Residual wrapper and Linear space decorder wrapper in side the both encorder and decorder?
Highly appreciated you inputs.

Thank you,
Kavindu

Relationship about joint and dim_to_use?

Hi there,
I am new to here and I am quite interested in this research topic. However, there is one question I cannot figure out. So I come here for help.

In your code, you throw away dimension whose std<1e-4. Say that there are 99 dims, of which 45 dims are useless, then 54 dims left, right?

As your said:

Regarding the 32 joints, I believe only 17 are independent, and the rest are end effectors as you call them.

However, when I print the useless dims index as here:
[10 11 16 17 18 19 20 25 26 31 32 33 34 35 48 49 50 58 59 63 64 65 66 67
68 69 70 71 72 73 74 82 83 87 88 89 90 91 92 93 94 95 96 97 98]

I found that some of them are not correspond to one joint! For example, joint 3 should correspond dimension: 9, 10, 11. However, only dimension 10 and 11 here is dim-to-ignore.

This means, according to ignore these dimension, you simply violent the correspondance of the relationship: one joint ~ 3 dimension. Right?

The last quesion, although input dimension is 54, which DOES NOT REPRESENT 17 joints, the output dimenstion is also 54, which DOES REPRENT 17 joints, am i right?

This is a link from another issue.

Regarding the 32 joints, I believe only 17 are independent, and the rest are end effectors as you call them. IIRC some joints are repeated -- I remember observing this when I plotted the index in 3d as I was going down the tree, but you may want to confirm it yourself.

Originally posted by @una-dinosauria in #23 (comment)

lr_flip

Originally posted by @jutanke in #46

a question about fkl code?

Hi Julieta,

Thank you for releasing your code.

I am a little bit confused about these code in fkl.

xangle = angles[ rotInd[i][0]-1 ]
yangle = angles[ rotInd[i][1]-1 ]
zangle = angles[ rotInd[i][2]-1 ]

thisPosition = np.array([xangle, yangle, zangle])

In angles variable, there should only be expmap of rotation matrix for each joint except for the first three elements. Why these code use some of the elements as the positions of the joints?

Thank you
Wei

Shared Variables between decoder and encoder in the basic architecture

@una-dinosauria

Hi,
I congratulate you for this paper being cited multiple times recently,
I am just confused about the basic architecture written in the code, as it was mentioned in the basic form the variables of encoder and decoder are not shared , it means each decoder and encoder have their own variables, but when I checked in the basic architecture the same GRUcell is used for decoder and encoder, and when I check the number of variables it was as same as the number of variables in "tied" architecture ,
I mean I guess in the basic architecture variables of encoder and decider are not separated??
I would appreciate you if you help me understanding my mistake.

Thanks,

Can I link with OpenPose?

I would like to predict human behavior using 3D coordinates obtained with OpenPose, is it possible?

question about the srnn_loss

In the translate.py, when you test with srnn seeds, it looks like the srnn_loss you print out is only for the last action instead of an average loss of all actions.

How to get the Prediction results?

Sorry @una-dinosauria this is a very basic question but I need to ask it.
Could you tell me please where to find the prediction results in your code?

I want to try your code with another dataset but I don't know how to get the prediction results.

Thanks

Reading Input File

Hey, I am trying to recreate the algorithm from scratch and for this, I need to read the Human3.6M dataset. Opening the same just gives me a long text file with no information as to what point corresponds to what and also how I am supposed to use it. Could you please let me know what the input and output is for your rnn ?

Visualization

Looking forward to see the visualization. Thanks~

How to visualise a motion from the dataset

I simply want to visualise a motion from the dataset without having to train a model.
From the flow of your script it is not obvious how to do it, might it be possible to elaborate on how to do it?
Thanks.

TXT file in Human3.6m

Hello, first of all, thank you very much for your open source code. I am a newcomer to research. Can you ask some questions? What is stored in each frame of each TXT file in Human3.6m? I am a bit confused about what is the mathematical meaning, why can I remove some data directly? thank you very much.

target_seq_len or source_seq_len in get_batch_srnn ?

Hi Julieta,

In following line 558 of seq2seq_model.py file for get_batch_srnn function, shouldn't it be target_seq_len in place of source_seq_len ?

decoder_outputs[i, :, :] = data_sel[source_seq_len:, :]

Thanks.

Visualization Code and Data Clarification

Thank you for taking the time to make your code publicly available! I also really liked your paper and found it very interesting.

I am a bit confused regarding the data representation, though, and how the visualization works. Specifically, I am referring to this code snippet in forward_kynematics.fkl:

    for i in np.arange(njoints):

        if not rotInd[i]:  # If the list is empty
            xangle, yangle, zangle = 0, 0, 0
        else:
            xangle = angles[rotInd[i][0] - 1]
            yangle = angles[rotInd[i][1] - 1]
            zangle = angles[rotInd[i][2] - 1]

        r = angles[expmapInd[i]]

        thisRotation = data_utils.expmap2rotmat(r)
        thisPosition = np.array([xangle, yangle, zangle])

        if parent[i] == -1:  # Root node
            xyzStruct[i]['rotation'] = thisRotation
            xyzStruct[i]['xyz'] = np.reshape(offset[i, :], (1, 3)) + thisPosition
        else:
            xyzStruct[i]['xyz'] = (offset[i, :] + thisPosition).dot(xyzStruct[parent[i]]['rotation']) + \
                                  xyzStruct[parent[i]]['xyz']
            xyzStruct[i]['rotation'] = thisRotation.dot(xyzStruct[parent[i]]['rotation'])

What confuses me is the fact that thisPosition = np.array([xangle, yangle, zangle]) is added to the offset, i.e. the final 3D position of each joint. The data (i.e. angles) has shape (99,). I believe that the first three dimensions are the position of the root (I read this somewhere in a comment, but forgot where :)). So the remaining 96 dimensions are the 32 exponential map coordinates for the 32 joints, right? rotInd points into angles, so thisPosition = np.array([xangle, yangle, zangle]) is actually joint angle data and thus should not be added to a position vector in my opinion. In fact, I tried to just set thisPosition to zero and the plot looks very similar. I assume the angles (given in radians) are just small enough to not make a hug difference.

Another thing that confuses me is that we seem to have 32*3 exponential map coordinates, implying we have 32 joints. However the H3.6M skeletons only have 25 joints (I checked this by downloading the H3.6M code files). I believe that the remaining 7 "joints" are in fact end effector nodes, for which joint angles are typically not defined. This is also confirmed by the contents of your rotInd matrix (the end effectors being the entries in rotInd that are empty). I checked the contents of S1 walking_1.txt and the data corresponding to the 7 end effectors, i.e. indices [5, 10, 15, 21, 23, 29, 31], is empty anyways (i.e. zero vectors in every frame). This is a minor thing, as the visualization is not impacted by that and because I believe that you remove those entries from the data before you feed it to the model. However, this confused me a lot, so I just wanted to ask if you could confirm that and I also wanted to write it down somewhere for reference for future readers.

Question about the variables (self.HUMAN_SIZE) and (dimensions_to_use)

Hi @una-dinosauria

I've noticed when I used your application that this variable dimensions_to_use in normalization_stats function in data_utils file - this variable has the same value regardless the actions used in training process (whether it's all actions or walking only).
dimensions_to_use = 54, which is the same as the value of self.HUMAN_SIZE in seq2seq_model file.
But when I try to train the program on quaternions dataset, I get different values for the variable dimensions_to_use when I train the program on all actions and when I train on walking only.
This difference causes an error because dimensions_to_use is supposed to be equal to self.HUMAN_SIZE always.

So I'm not sure if the variables self.HUMAN_SIZE and dimensions_to_use should be always constant in all cases regardless of the actions used in training or that they might be different for different actions!?

Thanks in advance.

Inference on a video/webcam

Is it possible to parse video from file or IP camera using OpenCV and then get a prediction of the motion.
I want to be able to classify an action in a video.
Sorry, i'm new to computer vision. Thank you

Can't reproduce paper's result.

Hi, I like your work and try to build my own research based on your idea, but I simply couldn't reproduce your paper's result.

Here is what I have done:
python3 src/translate.py --action walking --seq_length_out 25
python3 src/translate.py --residual_velocities --action walking

What I have got:
Aside from plausible animations for each action, following table is what I got from my experiment:

'Long term' and 'YOUR WORK' are sampling-based loss(SA) from my experiment and your reported results, the last one is for SRNN paper's motion forecasting error.

Walking\ time(ms) 80 | 160 | 320 | 400 | 560 | 1000
Long term | 1.004 | 1.190 | 1.473 | 1.594 | 1.794 | 2.027
YOUR WORK | 0.92 | 0.98 | 1.02 | 1.20
SRNN paper's |1.08 | 1.34 | 1.60 | --- | 1.90 | 2.13

Eating\ time(ms) 80 | 160 | 320 | 400 | 560 | 1000
Long term | 1.195 | 1.473 | 1.998 | 2.184 | 2.316 | 2.336
YOUR WORK | 0.98 | 0.99 | 1.18 | 1.31
SRNN paper's |1.35 | 1.71 | 2.12 | --- | 2.28 | 2.58

Smoking\ time(ms) 80 | 160 | 320 | 400 | 560 | 1000
Long term | 1.282 | 1.572 | 2.486 | 2.609 | 3.258 | 2.861
YOUR WORK |1.38 | 1.39 | 1.56 | 1.65
SRNN paper's |1.90 | 2.30 | 2.90 | ---- | 3.21 | 3.23

Discussion\ time(ms) 80 | 160 | 320 | 400 | 560 | 1000
Long term | 1.605 | 1.986 | 2.513 | 2.702 | 3.087 | 3.187
YOUR WORK | 1.78 | 1.80 | 1.83 | 1.90
SRNN paper's |1.67 | 2.03 | 2.20 | ---- | 2.39 | 2.43

I am feeling puzzled about:

  1. According to your result, I suppose my answer is wrong, but tolerable when they are compared with SRNN paper's result. Can you give some advice for correcting my work?

  2. I think maybe the iteration about 1e5 is kinda too large, because I noticed that error would grow larger as iteration increases.

Looking forward your reply, sincerely thanks!!!

=====================================
UPDATE

(number in boldface indicates the best result)
Walking\ time(ms) 80 | 160 | 320 | 400 | 560 | 1000
1e4th iteration | 1.306 | 1.360 | 1.362 | 1.380 | 1.381 | 1.488
2e4th iteration |1.195 | 1.276 | 1.318 | 1.345 | 1.401 | 1.554
YOUR WORK | 0.92 | 0.98 | 1.02 | 1.20

Eating\ time(ms) 80 | 160 | 320 | 400 | 560 | 1000
1e4th iteration |1.126 | 1.189 | 1.300 | 1.380 | 1.507 | 1.752
2e4th iteration |1.043 | 1.162 | 1.379 | 1.497 | 1.674 | 2.036
YOUR WORK | 0.98 | 0.99 | 1.18 | 1.31

Smoking\ time(ms) 80 | 160 | 320 | 400 | 560 | 1000
1e4th iteration | 1.514 | 1.597 | 1.752 | 1.789 | 1.862 | 2.257
2e4th iteration | 1.238 | 1.357 | 1.593 | 1.640 | 1.738 | 2.196
YOUR WORK |1.38 | 1.39 | 1.56 | 1.65

Discussion\ time(ms) 80 | 160 | 320 | 400 | 560 | 1000
1e4th iteration |1.682 | 1.803 | 1.847 | 1.825 | 1.952 | 2.185
2e4th iteration |1.439 | 1.603 | 1.710 | 1.728 | 1.938 | 2.196
YOUR WORK | 1.78 | 1.80 | 1.83 | 1.90

Taking your advice, I checked 10000th and 20000th iteration's result, they improved performance, Thanks!! I suppose to choose 20000th iteration is better one for sampling-based loss experiment,
but gaps still exist, especially in walking and eating actions, is this normal?

Really sorry to bother you, is code on github your final version or just a demo, if not, what changes should I make? BTW, is iteration time uniform for all experiments? for example, Seq2seq architecture and sampling-based loss(SA) and Residual architecture(Residual(SA)) are all using 10000 iterations?

`num_layers > 1` not really supported

thanks heaps for putting this code online!

i've been playing around with this code, and when trying to add more layers, i get a size mismatch:

ValueError: Trying to share variable basic_rnn_seq2seq/rnn/gru_cell/gates/kernel, but specified shape (2048, 2048) and found shape (1060, 2048).

i believe that the problem is in rnn_cell_extensions.LinearSpaceDecoderWrapper

basically, it seems to conjure up some magic variables proj_w_out and proj_b_out. but the sizes are wrong because i've set more than one GRU cell.

i.e. 1060 = 1024 + my input size, and it expects (2048,2048) because numlayers == 2

i think it should be an easy fix, but i'm not entirely sure what it is because i'm not sure where proj_w_out comes from ...

Question about the zero velocity baseline

I Have read your paper, but I don't really understand the zero velocity baseline. It has few descriptions in the paper. I am not sure the mean "predict the first frame by velocity zero", it that means set the first frame of predicted motion clip the exactly same as the last frame in Input motion clip?

Can the preprocessed data converted to quaternions?

May I ask you @una-dinosauria a question?

Can the preprocessed data be converted to quaternions instead of exponential maps?
Is there a code for the conversion?
I would be very thankful to you if you can provide me with the same data you used but represented with quaternions instead of exponential maps.

Sorry If I'm asking the wrong question.

Dataset space

Hi @una-dinosauria
Is there any reason for assuming joints data in angles, I mean why you choose to work with data in exponential map of angles?

left-right flip

It seems like left and right are flipped in your 3D representation, e.g. the following code

self.LR = np.array([1,1,1,0,0,0,0, 0, 0, 0, 0, 0, 0, 1, 1, 1], dtype=bool)

seems to conflict with the human 3.6m matlab code

switch part
        case 'rootpos'
          joints = 1;
        case 'rootrot'
          joints = 1;
        case 'leftarm'
          joints = 18:24;% p/p2/a fine
        case 'rightarm'
          joints = 26:32;% p/p2/a fine
        case 'head'
          joints = 14:16;% p/p2/a fine
        case 'rightleg'
          joints = 2:6;% p/p2/a fine
        case 'leftleg'
          joints = 7:11;% p/p2/a fine
        case 'upperbody'
          joints = [14:32];% p/p2/a fine
        case 'arms'
          joints = [16:32];% p/p2/a fine
        case 'legs'
          joints = 1:11;% p/p2/a fine
        case 'body'
          joints = [1 2 3 4 7 8 9 13 14 15 16 18 19 20 26 27 28];% p/p2/a fine
        otherwise
          error('Unknown');
      end

.
The plot below shows the original 3D ground truth location ('Ground Truth') from human3.6m, the particular frame of the video and the 3D plot in expmap format, transformed to 3D joint coordinates using your forward kinematics function, provided with your repository ('ExpMaps').

lr_flip

Is there an intention behind the flip?

Test time prediction inputs

Hello Julieta,

I'm bit confused by the line, at this line we are feeding "encoder_inputs, decoder_inputs, decoder_outputs" to network to make the prediction, and model makes prediction with these data.
As i understood from the line decoder_inputs contains data o step behind the decoder_outputs. In this configuraion, model just makes the forcasting one step ahead?.
ps: If i correctly understood, at this line model makes the prediction, As i see, it uses both encoder_inputs and decoder_outputs.

Reason to use 54 dimensions

Hi Una-dinosauria,

Can you please explain me the reason to use only 54 dimensions out of 99 dimensions. I tried to find it in the paper and the code. But I was unable to find. Your response is highly appreciated.

Thank you,
Kavindu

"forward_only" option is not in Seq2SeqModel.__init__

Hello Julieta, thans for sharing your code, excellent job and paper.

However, i'm wondering if there is an error baseline.py where you initialize the Seq2SeqModel class with a parameter "forward_only" which is not included in Seq2SeqModel.init.

Thank you very much:)

Why only use the even line of data in data_util

I noticed in the data_util. This method only selects the even line of the original data into the training data and test data. I don't quite understand that. I was thought to directly select all the lines. Besides, in the data n /time d in txt file, is n means there are n frames?

    even_list = range(0, n, 2)

A question about sampling_based loss

Hi @una-dinosauria

I have a question about sampling_based loss.

According to my understanding, the sampling_based loss means giving the previous output to the current input inside the decoder. Also, I think it is done inside the seq2seq_model.py file and the corresponding line is mentioned below(loop function).

def lf(prev, i): # function for sampling_based loss

Could you please tell me that my understanding is correct or wrong?
If I am wrong can you please explain me how the sampling_based loss is calculated in the code.

Your response is highly appreciated.

Thank you,
Kavindu

Understanding the structure of H3.6M dataset

Hi Una-dinosauria,

First of all, I need to thank you for making this fantastic paper and code available publicly and it is really helpful for my academics.

I have a very basic question. Can you please explain the structure of the dataset and what is the meaning of 99 numbers in each row? In the site, they are describing 96 data, but in H3.6M dataset there are 99 data in each raw.

needs .txt files but human 3.6m has .cdf files

Hi Julieta,

Thanks for the code to your paper.

I was trying to run your code for pre-processing the data. Specifically, I face the issue in the following lines of code in the load_data function (in data_utils.py)

filename = '{0}/S{1}/{2}_{3}.txt'.format( path_to_dataset, subj, action, subact) action_sequence = readCSVasFloat(filename)

Can you please explain how you converted .cdf files to .txt files for the above usage ?

Thanks.

Sampling rate of the data

I would like to know what is the actual sampling rate of the data?

I think you had pointed this out to the author of Structure-RNN in the following thread that it is 25fps, but it is not very clear to me.
asheshjain399/RNNexp#6

I check your source code in

If the original data is 200fps, I think the sampling rate of your data should be 100 fps, not 25 fps as mentioned in src/translate.py#L34.

Thank you.

Problem in restoring the checkpoint

Thank you for posting the code. May I ask is that right to use the "...--sample -load 100000" to get the prediction results of iteration 100000? When I want to restore the checkpoint of 100000 or 95000 or others, it seems that they always return the same results with the last checkpoint 100000. And I find the problem occur in line 108 in translate.py. Should it be changed to "model.saver.restore(session, ckpt_name)" so that each time it can return the loaded iteration instead of the last checkpoint, or indeed I feed in a wrong argument?

Start and end joint

Why the start and end joins in viz don't covert all 32 joints?

self.I   = np.array([1,2,3,1,7,8,1, 13,14,15,14,18,19,14,26,27])-1
self.J   = np.array([2,3,4,7,8,9,13,14,15,16,18,19,20,26,27,28])-1

Encoder inputs are less than source_seq_len by 1

Hi @una-dinosauria
I'm wondering why did you choose encoder_inputs to be less than source_seq_len by 1?
Line 453
While you choose decoder_outputs and decoder_inputs to be the same as target_seq_len.

def get_batch( self, data, actions ):
"""Get a random batch of data from the specified bucket, prepare for step.
Args
data: a list of sequences of size n-by-d to fit the model to.
actions: a list of the actions we are using
Returns
The tuple (encoder_inputs, decoder_inputs, decoder_outputs);
the constructed batches have the proper format to call step(...) later.
"""
# Select entries at random
all_keys = list(data.keys())
chosen_keys = np.random.choice( len(all_keys), self.batch_size )
# How many frames in total do we need?
total_frames = self.source_seq_len + self.target_seq_len
encoder_inputs = np.zeros((self.batch_size, self.source_seq_len-1, self.input_size), dtype=float)
decoder_inputs = np.zeros((self.batch_size, self.target_seq_len, self.input_size), dtype=float)
decoder_outputs = np.zeros((self.batch_size, self.target_seq_len, self.input_size), dtype=float)

TF Saver has issues on Windows

Hi there,

Quick post below that I'll update later when I have some more time;

Thanks for this paper and for sharing your code. I'm trying to replicate your results on Windows 10, and the TensorFlow Saver class that saves the model as it is training seems to have an issue. Either the path name or the file name of the files is far too long for Windows or NTFS (I haven't determined which yet). To help me debug this, can you let me know what operating system and file system you were running this code on?

Thank you, I'll share more info later.

Calculating residual velocity

Hi Una-dinosauria,

According to my understanding to eliminate the first frame discontinuity, residual velocity should be calculated(Please correct me if I am wrong). But in rnn_cell_extensions.py file, I cannot find the place where the residual velocity is calculated. I can only see the code which is related to the residual connection. Can you please point me to the correct place where the residual velocity is calculated.

Thank you,
Kavindu

Human3.6M D3 angles

Hi,

Somewhere in your paper, you mentioned that H3.6M has 54 independent joint angles in 3D space. Are these correspond to 18 joints? If possible, I really appreciate if you could please give me a link to the data that contains these 54 joint angles (with their corresponding joint index) per sequence.

I was looking at H3.6M both on its website and the link you've mentioned in your repo. The values are different in different sources. Moreover, in the dataset you've used, there are 99 values per frame, and you select 54 of them. However, I'm not sure what are these 54 values per frame. Are they correspond to 18 joints in 3D angle space? if yes, would you please let me know the triplet (x, y, and z) indices correspond to each joint?

Thanks

pytorch version

Hi!

I implemented a pytorch version of the code in this fork. The results are similar for the demos I tried so far.

TypeError: 'dict_keys' object does not support indexing when trying to train.

Hello,

I faced a type error after trying to run the train demo that was found on readme.md.
I've already installed tensorflow and h5py.
Here's the traceback error when running the training.

`File "translate.py", line 700, in
tf.app.run()

File "/home/lenovo-4/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))

File "translate.py", line 697, in main
train()

File "translate.py", line 153, in train
encoder_inputs, decoder_inputs, decoder_outputs = model.get_batch( train_set, not FLAGS.omit_one_hot )

File "/home/lenovo-4/Shiangyoung/human-motion-prediction-master/seq2seq_model.py", line 462, in get_batch
the_key = all_keys[ chosen_keys[i] ]
TypeError: 'dict_keys' object does not support indexing
`

Would be great if anyone could help out.

Thank you.

Edit : Changed all_keys = list(data.keys()) at line 450 in seq2seq_model.py

Doubt on quantitative result table.

Hi,

Firstly, I want to congratulate you for this paper being cited for multiple times recently.

However, I still feel confused about some details. I double check your paper where you successfully reproduced previous works (SRNN) in CVPR2016 and published results of them:
screen shot 2018-06-05 at 8 03 34 pm

Is it the result that you optimized these models over 400ms or 1000ms? I guess that result above is one being optimized over 400ms, the one that I traced through here github with source code is optimized over 1000ms, which means model predicts for long term. I also attached your posted result here:
screen shot 2018-06-04 at 1 00 11 pm

Can you help out here?

Best regards,
CHELSEA234

Small bug in the code

Hello,
Thank you very much for code of your paper (paper was quite good also). I think there is a small bug in your code. while computing error you are checking the estimated value std, and selecting the dimensions to use. I think this is not the right way selecting dimensions( When your model estimate quite low number you can still get good results). Also referenced code finding dimensions to use as I said.

Bug line

PS: I fixed the code, and tried to run model, it seems it is still converging quite good :)
Have a nice time in Havai

About fkl function

Hello Julieta,
I am a graduate student who is interested in motion prediction.
I have questions about fkl function of forward_kinematics.py.

  1. The dimension of the angles received by the fkl function is 99, the first dim 1,2 and 3 represents the translation of the root joint, the dim 4, 5 and 6 represent the Exp.Coord of the root joint, dim 7~99 represent the Exp.coord of the remaining joints. Is it correct?

  2. I do not know what rotInd and expmapInd represent. Looking at line 48 of forward_kinematics.py, position is updated with thisPosition = np.array ([xangle, yangle, zangle]), where xangle, yangle, and zangle are Exp.coord. xangle, yangle, and zangle are Exp.coord, why does this represent position?
    Line 52 does not make sense to update xyz by adding offset (bone length).

  3. The 54,55 line forward kinematics part is not well understood. I would be very grateful if you could explain it.

Slight unnecessary hardcode

# Reproducing SRNN's sequence subsequence selection as done in
# https://github.com/asheshjain399/RNNexp/blob/master/structural_rnn/CRFProblems/H3.6m/processdata.py#L343
for i in xrange( batch_size ):
_, subsequence, idx = seeds[i]
idx = idx + 50
data_sel = data[ (subject, action, subsequence, 'even') ]
data_sel = data_sel[(idx-source_seq_len):(idx+target_seq_len) ,:]
encoder_inputs[i, :, :] = data_sel[0:source_seq_len-1, :]
decoder_inputs[i, :, :] = data_sel[source_seq_len-1:(source_seq_len+target_seq_len-1), :]
decoder_outputs[i, :, :] = data_sel[source_seq_len:, :]

In the above, at line 550, I believe the 50 is actually a source_seq_len?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.