una-dinosauria / human-motion-prediction Goto Github PK

View Code? Open in Web Editor NEW

406.0 16.0 140.0 913 KB

Simple baselines and RNNs for predicting human motion in tensorflow. Presented at CVPR 17.

License: MIT License

Python 100.00%

tensorflow computer-vision cvpr-2017

human-motion-prediction's Introduction

human-motion-prediction

This is the code for the paper

Julieta Martinez, Michael J. Black, Javier Romero. On human motion prediction using recurrent neural networks. In CVPR 17.

It can be found on arxiv as well: https://arxiv.org/pdf/1705.02445.pdf

The code in this repository was written by Julieta Martinez and Javier Romero.

Dependencies

h5py -- to save samples
Tensorflow 1.2 or later.

Get this code and the data

First things first, clone this repo and get the human3.6m dataset on exponential map format.

git clone https://github.com/una-dinosauria/human-motion-prediction.git
cd human-motion-prediction
mkdir data
cd data
wget http://www.cs.stanford.edu/people/ashesh/h3.6m.zip
unzip h3.6m.zip
rm h3.6m.zip
cd ..

Quick demo and visualization

For a quick demo, you can train for a few iterations and visualize the outputs of your model.

To train, run

python src/translate.py --action walking --seq_length_out 25 --iterations 10000

To save some samples of the model, run

python src/translate.py --action walking --seq_length_out 25 --iterations 10000 --sample --load 10000

Finally, to visualize the samples run

python src/forward_kinematics.py

This should create a visualization similar to this one

Running average baselines

To reproduce the running average baseline results from our paper, run

python src/baselines.py

RNN models

To train and reproduce the results of our models, use the following commands

model	arguments	training time (gtx 1080)	notes
Sampling-based loss (SA)	`python src/translate.py --action walking --seq_length_out 25`	45s / 1000 iters	Realistic long-term motion, loss computed over 1 second.
Residual (SA)	`python src/translate.py --residual_velocities --action walking`	35s / 1000 iters
Residual unsup. (MA)	`python src/translate.py --residual_velocities --learning_rate 0.005 --omit_one_hot`	65s / 1000 iters
Residual sup. (MA)	`python src/translate.py --residual_velocities --learning_rate 0.005`	65s / 1000 iters	best quantitative.
Untied	`python src/translate.py --residual_velocities --learning_rate 0.005 --architecture basic`	70s / 1000 iters

You can substitute the --action walking parameter for any action in

["directions", "discussion", "eating", "greeting", "phoning",
 "posing", "purchases", "sitting", "sittingdown", "smoking",
 "takingphoto", "waiting", "walking", "walkingdog", "walkingtogether"]

or --action all (default) to train on all actions.

The code will log the error in Euler angles for each action to tensorboard. You can track the progress during training by typing tensorboard --logdir experiments in the terminal and checking the board under http://127.0.1.1:6006/ in your browser (occasionally, tensorboard might pick another url).

Citing

If you use our code, please cite our work

@inproceedings{julieta2017motion,
  title={On human motion prediction using recurrent neural networks},
  author={Martinez, Julieta and Black, Michael J. and Romero, Javier},
  booktitle={CVPR},
  year={2017}
}

Other implementations

Pytorch by @enriccorona

Acknowledgments

The pre-processed human 3.6m dataset and some of our evaluation code (specially under src/data_utils.py) was ported/adapted from SRNN by @asheshjain399.

Licence

MIT

human-motion-prediction's People

Contributors

Stargazers

Watchers

Forkers

bensondou yozey xjwxjw alfsici jcwen wookay aaronsnoswell runningdongxu ml-lab zzhang1987 yongyitang92 seleucia quxiaofeng trantorrepository adrianhsu vic4key 1165048017 rcorona fengyun7810 freywang taojiashu samsgates rodrigogantier firephinx rajeev595 ajdroid sergeyprokudin dinggit labimage jsmilemsj ilovecv wooramkang coderstellaj dykim07 chenxingyu1990 mechcd maharshi95 shaonannan yanqi1811 kavinduzoysa wisdomdeng ytixu huunhan312 liuzitongtt samarthgupta93 egomeh xinghuokang hongminwu hemasaij japneet644 garroud bolinpu dungxnguyen dliul enriccorona supershinyeyes manoj652 x3titan auzyze louzi233 uknowhops baochun123 panispani justinzhangcolumbia jianyuchen23 amanda-buttini ttotmoon lewisget s-broda peterzs peterzhousz ashishali anidaniel rshivansh agtiger youngflyasd wallflower95 nvinh yqrickwang jonyboy2000 liwj812 m-nolan blldw happymin77 faisalshahbaz ablattmann gxlearner herolin12 rjunx kaikaishen qingshihuangdi tobytoy hecranechen zarin-subah-shamma userratos sysu19351162 zhangyifan1993 lianjifei electronicdevil huangkang314

human-motion-prediction's Issues

Function to transform from 3D XYZ to exponential map

Hello,

Do you know how to transform from 3D XYZ coordinates of joints to the exponential map of the joint angle?

Is it a simple way to transform them?

Best,

Figure3

@una-dinosauria Is it possible to have a code corresponding to this figure? I would appreciate you if you provide that for further comparisons as a reference

A question on Encoder cells and Decoder cells

Hi @una-dinosauria,

I have a question on modeling Encoder cells and Decoder cells which still I was unable to solve. Let me explain it like this.

When the seq2seq model is created, first it will create the GRUcell like below.

human-motion-prediction/src/seq2seq_model.py

Line 83 in c9a2774

cell = tf.contrib.rnn.GRUCell( self.rnn_size )

Then it will add the Linear Space decorder to the GRU cell like below.

human-motion-prediction/src/seq2seq_model.py

Line 112 in c9a2774

cell = rnn_cell_extensions.LinearSpaceDecoderWrapper( cell, self.input_size )

After the it will add the redual wrapper to model velocity as below.

human-motion-prediction/src/seq2seq_model.py

Line 116 in c9a2774

cell = rnn_cell_extensions.ResidualWrapper( cell )

Finally that cell is used to model full seq2seq model like below.

human-motion-prediction/src/seq2seq_model.py

Line 138 in c9a2774

    
           outputs, self.states = tf.contrib.legacy_seq2seq.tied_rnn_seq2seq( enc_in, dec_in, cell, loop_function=lf )

According to the below image, the cell which is wrapped by Linear Space decorder wrapper and redual wrapper is only used in decorder and in the encorder we have a simple cell.

My problem is, according to this implimentation do we expect to run Residual wrapper and Linear space decorder wrapper in side the both encorder and decorder?
Highly appreciated you inputs.

Thank you,
Kavindu

Question on global translation and orientation

I see in here, the error are computed without the first 6 angles. I wonder why those angles are used in the training, as when I print dim_to_ignore, I see that their indices are included.

Thanks

Relationship about joint and dim_to_use?

Hi there,
I am new to here and I am quite interested in this research topic. However, there is one question I cannot figure out. So I come here for help.

In your code, you throw away dimension whose std<1e-4. Say that there are 99 dims, of which 45 dims are useless, then 54 dims left, right?

As your said:

Regarding the 32 joints, I believe only 17 are independent, and the rest are end effectors as you call them.

However, when I print the useless dims index as here:
[10 11 16 17 18 19 20 25 26 31 32 33 34 35 48 49 50 58 59 63 64 65 66 67
68 69 70 71 72 73 74 82 83 87 88 89 90 91 92 93 94 95 96 97 98]

I found that some of them are not correspond to one joint! For example, joint 3 should correspond dimension: 9, 10, 11. However, only dimension 10 and 11 here is dim-to-ignore.

This means, according to ignore these dimension, you simply violent the correspondance of the relationship: one joint ~ 3 dimension. Right?

The last quesion, although input dimension is 54, which DOES NOT REPRESENT 17 joints, the output dimenstion is also 54, which DOES REPRENT 17 joints, am i right?

This is a link from another issue.

Regarding the 32 joints, I believe only 17 are independent, and the rest are end effectors as you call them. IIRC some joints are repeated -- I remember observing this when I plotted the index in 3d as I was going down the tree, but you may want to confirm it yourself.

Originally posted by @una-dinosauria in #23 (comment)

Originally posted by @jutanke in #46

a question about fkl code?

Hi Julieta,

Thank you for releasing your code.

I am a little bit confused about these code in fkl.

human-motion-prediction/src/forward_kinematics.py

Lines 41 to 43 in c9a2774

    
           xangle = angles[ rotInd[i][0]-1 ] 
        
           yangle = angles[ rotInd[i][1]-1 ] 
        
           zangle = angles[ rotInd[i][2]-1 ]

human-motion-prediction/src/forward_kinematics.py

Line 48 in c9a2774

thisPosition = np.array([xangle, yangle, zangle])

In angles variable, there should only be expmap of rotation matrix for each joint except for the first three elements. Why these code use some of the elements as the positions of the joints?

Thank you
Wei

Shared Variables between decoder and encoder in the basic architecture

@una-dinosauria

Hi,
I congratulate you for this paper being cited multiple times recently,
I am just confused about the basic architecture written in the code, as it was mentioned in the basic form the variables of encoder and decoder are not shared , it means each decoder and encoder have their own variables, but when I checked in the basic architecture the same GRUcell is used for decoder and encoder, and when I check the number of variables it was as same as the number of variables in "tied" architecture ,
I mean I guess in the basic architecture variables of encoder and decider are not separated??
I would appreciate you if you help me understanding my mistake.

Thanks,

Can I link with OpenPose?

I would like to predict human behavior using 3D coordinates obtained with OpenPose, is it possible?

question about the srnn_loss

In the translate.py, when you test with srnn seeds, it looks like the srnn_loss you print out is only for the last action instead of an average loss of all actions.

How to get the Prediction results?

Sorry @una-dinosauria this is a very basic question but I need to ask it.
Could you tell me please where to find the prediction results in your code?

I want to try your code with another dataset but I don't know how to get the prediction results.

Thanks

Reading Input File

Hey, I am trying to recreate the algorithm from scratch and for this, I need to read the Human3.6M dataset. Opening the same just gives me a long text file with no information as to what point corresponds to what and also how I am supposed to use it. Could you please let me know what the input and output is for your rnn ?

Visualization

Looking forward to see the visualization. Thanks~

How to visualise a motion from the dataset

I simply want to visualise a motion from the dataset without having to train a model.
From the flow of your script it is not obvious how to do it, might it be possible to elaborate on how to do it?
Thanks.

TXT file in Human3.6m

Hello, first of all, thank you very much for your open source code. I am a newcomer to research. Can you ask some questions? What is stored in each frame of each TXT file in Human3.6m? I am a bit confused about what is the mathematical meaning, why can I remove some data directly? thank you very much.

target_seq_len or source_seq_len in get_batch_srnn ?

Hi Julieta,

In following line 558 of seq2seq_model.py file for get_batch_srnn function, shouldn't it be target_seq_len in place of source_seq_len ?

decoder_outputs[i, :, :] = data_sel[source_seq_len:, :]

Thanks.

The begin idx in the sequence

Dear una-dinosauria:

There is a question.
In

human-motion-prediction/src/seq2seq_model.py

Lines 464 to 465 in c9a2774

    
           # Sample somewherein the middle 
        
           idx = np.random.randint( 16, n-total_frames )

we have idx = np.random.randint( 16, n-total_frames ).
Why the start index in the sequence are randomly selected after 16? Is there a reason?

Visualization Code and Data Clarification

Thank you for taking the time to make your code publicly available! I also really liked your paper and found it very interesting.

I am a bit confused regarding the data representation, though, and how the visualization works. Specifically, I am referring to this code snippet in forward_kynematics.fkl:

    for i in np.arange(njoints):

        if not rotInd[i]:  # If the list is empty
            xangle, yangle, zangle = 0, 0, 0
        else:
            xangle = angles[rotInd[i][0] - 1]
            yangle = angles[rotInd[i][1] - 1]
            zangle = angles[rotInd[i][2] - 1]

        r = angles[expmapInd[i]]

        thisRotation = data_utils.expmap2rotmat(r)
        thisPosition = np.array([xangle, yangle, zangle])

        if parent[i] == -1:  # Root node
            xyzStruct[i]['rotation'] = thisRotation
            xyzStruct[i]['xyz'] = np.reshape(offset[i, :], (1, 3)) + thisPosition
        else:
            xyzStruct[i]['xyz'] = (offset[i, :] + thisPosition).dot(xyzStruct[parent[i]]['rotation']) + \
                                  xyzStruct[parent[i]]['xyz']
            xyzStruct[i]['rotation'] = thisRotation.dot(xyzStruct[parent[i]]['rotation'])

What confuses me is the fact that thisPosition = np.array([xangle, yangle, zangle]) is added to the offset, i.e. the final 3D position of each joint. The data (i.e. angles) has shape (99,). I believe that the first three dimensions are the position of the root (I read this somewhere in a comment, but forgot where :)). So the remaining 96 dimensions are the 32 exponential map coordinates for the 32 joints, right? rotInd points into angles, so thisPosition = np.array([xangle, yangle, zangle]) is actually joint angle data and thus should not be added to a position vector in my opinion. In fact, I tried to just set thisPosition to zero and the plot looks very similar. I assume the angles (given in radians) are just small enough to not make a hug difference.

Another thing that confuses me is that we seem to have 32*3 exponential map coordinates, implying we have 32 joints. However the H3.6M skeletons only have 25 joints (I checked this by downloading the H3.6M code files). I believe that the remaining 7 "joints" are in fact end effector nodes, for which joint angles are typically not defined. This is also confirmed by the contents of your rotInd matrix (the end effectors being the entries in rotInd that are empty). I checked the contents of S1 walking_1.txt and the data corresponding to the 7 end effectors, i.e. indices [5, 10, 15, 21, 23, 29, 31], is empty anyways (i.e. zero vectors in every frame). This is a minor thing, as the visualization is not impacted by that and because I believe that you remove those entries from the data before you feed it to the model. However, this confused me a lot, so I just wanted to ask if you could confirm that and I also wanted to write it down somewhere for reference for future readers.

Question about the variables (self.HUMAN_SIZE) and (dimensions_to_use)

Hi @una-dinosauria

I've noticed when I used your application that this variable dimensions_to_use in normalization_stats function in data_utils file - this variable has the same value regardless the actions used in training process (whether it's all actions or walking only).
dimensions_to_use = 54, which is the same as the value of self.HUMAN_SIZE in seq2seq_model file.
But when I try to train the program on quaternions dataset, I get different values for the variable dimensions_to_use when I train the program on all actions and when I train on walking only.
This difference causes an error because dimensions_to_use is supposed to be equal to self.HUMAN_SIZE always.

So I'm not sure if the variables self.HUMAN_SIZE and dimensions_to_use should be always constant in all cases regardless of the actions used in training or that they might be different for different actions!?

Thanks in advance.

Inference on a video/webcam

Is it possible to parse video from file or IP camera using OpenCV and then get a prediction of the motion.
I want to be able to classify an action in a video.
Sorry, i'm new to computer vision. Thank you

Can't reproduce paper's result.

Hi, I like your work and try to build my own research based on your idea, but I simply couldn't reproduce your paper's result.

Here is what I have done:
python3 src/translate.py --action walking --seq_length_out 25
python3 src/translate.py --residual_velocities --action walking

What I have got:
Aside from plausible animations for each action, following table is what I got from my experiment:

'Long term' and 'YOUR WORK' are sampling-based loss(SA) from my experiment and your reported results, the last one is for SRNN paper's motion forecasting error.

Walking\ time(ms) 80 | 160 | 320 | 400 | 560 | 1000
Long term | 1.004 | 1.190 | 1.473 | 1.594 | 1.794 | 2.027
YOUR WORK | 0.92 | 0.98 | 1.02 | 1.20
SRNN paper's |1.08 | 1.34 | 1.60 | --- | 1.90 | 2.13

Eating\ time(ms) 80 | 160 | 320 | 400 | 560 | 1000
Long term | 1.195 | 1.473 | 1.998 | 2.184 | 2.316 | 2.336
YOUR WORK | 0.98 | 0.99 | 1.18 | 1.31
SRNN paper's |1.35 | 1.71 | 2.12 | --- | 2.28 | 2.58

Smoking\ time(ms) 80 | 160 | 320 | 400 | 560 | 1000
Long term | 1.282 | 1.572 | 2.486 | 2.609 | 3.258 | 2.861
YOUR WORK |1.38 | 1.39 | 1.56 | 1.65
SRNN paper's |1.90 | 2.30 | 2.90 | ---- | 3.21 | 3.23

Discussion\ time(ms) 80 | 160 | 320 | 400 | 560 | 1000
Long term | 1.605 | 1.986 | 2.513 | 2.702 | 3.087 | 3.187
YOUR WORK | 1.78 | 1.80 | 1.83 | 1.90
SRNN paper's |1.67 | 2.03 | 2.20 | ---- | 2.39 | 2.43

I am feeling puzzled about:

According to your result, I suppose my answer is wrong, but tolerable when they are compared with SRNN paper's result. Can you give some advice for correcting my work?
I think maybe the iteration about 1e5 is kinda too large, because I noticed that error would grow larger as iteration increases.

Looking forward your reply, sincerely thanks!!!

=====================================
UPDATE

(number in boldface indicates the best result)
Walking\ time(ms) 80 | 160 | 320 | 400 | 560 | 1000
1e4th iteration | 1.306 | 1.360 | 1.362 | 1.380 | 1.381 | 1.488
2e4th iteration |1.195 | 1.276 | 1.318 | 1.345 | 1.401 | 1.554
YOUR WORK | 0.92 | 0.98 | 1.02 | 1.20

Eating\ time(ms) 80 | 160 | 320 | 400 | 560 | 1000
1e4th iteration |1.126 | 1.189 | 1.300 | 1.380 | 1.507 | 1.752
2e4th iteration |1.043 | 1.162 | 1.379 | 1.497 | 1.674 | 2.036
YOUR WORK | 0.98 | 0.99 | 1.18 | 1.31

Smoking\ time(ms) 80 | 160 | 320 | 400 | 560 | 1000
1e4th iteration | 1.514 | 1.597 | 1.752 | 1.789 | 1.862 | 2.257
2e4th iteration | 1.238 | 1.357 | 1.593 | 1.640 | 1.738 | 2.196
YOUR WORK |1.38 | 1.39 | 1.56 | 1.65

Discussion\ time(ms) 80 | 160 | 320 | 400 | 560 | 1000
1e4th iteration |1.682 | 1.803 | 1.847 | 1.825 | 1.952 | 2.185
2e4th iteration |1.439 | 1.603 | 1.710 | 1.728 | 1.938 | 2.196
YOUR WORK | 1.78 | 1.80 | 1.83 | 1.90

Taking your advice, I checked 10000th and 20000th iteration's result, they improved performance, Thanks!! I suppose to choose 20000th iteration is better one for sampling-based loss experiment,
but gaps still exist, especially in walking and eating actions, is this normal?

Really sorry to bother you, is code on github your final version or just a demo, if not, what changes should I make? BTW, is iteration time uniform for all experiments? for example, Seq2seq architecture and sampling-based loss(SA) and Residual architecture(Residual(SA)) are all using 10000 iterations?

`num_layers > 1` not really supported

thanks heaps for putting this code online!

i've been playing around with this code, and when trying to add more layers, i get a size mismatch:

ValueError: Trying to share variable basic_rnn_seq2seq/rnn/gru_cell/gates/kernel, but specified shape (2048, 2048) and found shape (1060, 2048).

i believe that the problem is in rnn_cell_extensions.LinearSpaceDecoderWrapper

basically, it seems to conjure up some magic variables proj_w_out and proj_b_out. but the sizes are wrong because i've set more than one GRU cell.

i.e. 1060 = 1024 + my input size, and it expects (2048,2048) because numlayers == 2

i think it should be an easy fix, but i'm not entirely sure what it is because i'm not sure where proj_w_out comes from ...

Question about the zero velocity baseline

I Have read your paper, but I don't really understand the zero velocity baseline. It has few descriptions in the paper. I am not sure the mean "predict the first frame by velocity zero", it that means set the first frame of predicted motion clip the exactly same as the last frame in Input motion clip?

A few question about data processing of human3.6m

Hi Julieta, I am confus about why a skeleton has 99 dims presentation? And I find that the rotMat2euler function is quite different to the conventional formulas which I think correctly implement in https://github.com/xioTechnologies/Quaternion-MATLAB-Library/blob/master/rotMat2euler.m . Do you have any idea about this?

clarification about the data

ValueError: Asked to load checkpoint 10000, but it does not seem to exist

When I run python src/translate.py --action walking --seq_length_out 25 --iterations 10000 --sample --load 10000 after training I get the following error

Can the preprocessed data converted to quaternions?

May I ask you @una-dinosauria a question?

Can the preprocessed data be converted to quaternions instead of exponential maps?
Is there a code for the conversion?
I would be very thankful to you if you can provide me with the same data you used but represented with quaternions instead of exponential maps.

Sorry If I'm asking the wrong question.

Dataset space

Hi @una-dinosauria
Is there any reason for assuming joints data in angles, I mean why you choose to work with data in exponential map of angles?

left-right flip

It seems like left and right are flipped in your 3D representation, e.g. the following code

human-motion-prediction/src/viz.py

Line 25 in c9a2774

self.LR = np.array([1,1,1,0,0,0,0, 0, 0, 0, 0, 0, 0, 1, 1, 1], dtype=bool)

seems to conflict with the human 3.6m matlab code

switch part
        case 'rootpos'
          joints = 1;
        case 'rootrot'
          joints = 1;
        case 'leftarm'
          joints = 18:24;% p/p2/a fine
        case 'rightarm'
          joints = 26:32;% p/p2/a fine
        case 'head'
          joints = 14:16;% p/p2/a fine
        case 'rightleg'
          joints = 2:6;% p/p2/a fine
        case 'leftleg'
          joints = 7:11;% p/p2/a fine
        case 'upperbody'
          joints = [14:32];% p/p2/a fine
        case 'arms'
          joints = [16:32];% p/p2/a fine
        case 'legs'
          joints = 1:11;% p/p2/a fine
        case 'body'
          joints = [1 2 3 4 7 8 9 13 14 15 16 18 19 20 26 27 28];% p/p2/a fine
        otherwise
          error('Unknown');
      end

.
The plot below shows the original 3D ground truth location ('Ground Truth') from human3.6m, the particular frame of the video and the 3D plot in expmap format, transformed to 3D joint coordinates using your forward kinematics function, provided with your repository ('ExpMaps').

Is there an intention behind the flip?

Test time prediction inputs

Hello Julieta,

I'm bit confused by the line, at this line we are feeding "encoder_inputs, decoder_inputs, decoder_outputs" to network to make the prediction, and model makes prediction with these data.
As i understood from the line decoder_inputs contains data o step behind the decoder_outputs. In this configuraion, model just makes the forcasting one step ahead?.
ps: If i correctly understood, at this line model makes the prediction, As i see, it uses both encoder_inputs and decoder_outputs.

Reason to use 54 dimensions

Hi Una-dinosauria,

Can you please explain me the reason to use only 54 dimensions out of 99 dimensions. I tried to find it in the paper and the code. But I was unable to find. Your response is highly appreciated.

Thank you,
Kavindu

"forward_only" option is not in Seq2SeqModel.init

Hello Julieta, thans for sharing your code, excellent job and paper.

However, i'm wondering if there is an error baseline.py where you initialize the Seq2SeqModel class with a parameter "forward_only" which is not included in Seq2SeqModel.init.

Thank you very much:)

Why only use the even line of data in data_util

I noticed in the data_util. This method only selects the even line of the original data into the training data and test data. I don't quite understand that. I was thought to directly select all the lines. Besides, in the data n /time d in txt file, is n means there are n frames?

    even_list = range(0, n, 2)

A question about sampling_based loss

Hi @una-dinosauria

I have a question about sampling_based loss.

According to my understanding, the sampling_based loss means giving the previous output to the current input inside the decoder. Also, I think it is done inside the seq2seq_model.py file and the corresponding line is mentioned below(loop function).

human-motion-prediction/src/seq2seq_model.py

Line 124 in c9a2774

def lf(prev, i): # function for sampling_based loss

Could you please tell me that my understanding is correct or wrong?
If I am wrong can you please explain me how the sampling_based loss is calculated in the code.

Your response is highly appreciated.

Thank you,
Kavindu

Understanding the structure of H3.6M dataset

Hi Una-dinosauria,

First of all, I need to thank you for making this fantastic paper and code available publicly and it is really helpful for my academics.

I have a very basic question. Can you please explain the structure of the dataset and what is the meaning of 99 numbers in each row? In the site, they are describing 96 data, but in H3.6M dataset there are 99 data in each raw.

needs .txt files but human 3.6m has .cdf files

Hi Julieta,

Thanks for the code to your paper.

I was trying to run your code for pre-processing the data. Specifically, I face the issue in the following lines of code in the load_data function (in data_utils.py)

filename = '{0}/S{1}/{2}_{3}.txt'.format( path_to_dataset, subj, action, subact) action_sequence = readCSVasFloat(filename)

Can you please explain how you converted .cdf files to .txt files for the above usage ?

Thanks.

training script

Hi,
Is there any plans to release the training script?

Sampling rate of the data

I would like to know what is the actual sampling rate of the data?

I think you had pointed this out to the author of Structure-RNN in the following thread that it is 25fps, but it is not very clear to me.
asheshjain399/RNNexp#6

I check your source code in

src/data_utils.py#L215 : Down-sampling by two (200 fps --> 100 fps)
src/seq2seq_model.py#438 : Select 8 data from the original sequence (100 fps --> 100 fps)

If the original data is 200fps, I think the sampling rate of your data should be 100 fps, not 25 fps as mentioned in src/translate.py#L34.

Thank you.

Problem in restoring the checkpoint

Thank you for posting the code. May I ask is that right to use the "...--sample -load 100000" to get the prediction results of iteration 100000? When I want to restore the checkpoint of 100000 or 95000 or others, it seems that they always return the same results with the last checkpoint 100000. And I find the problem occur in line 108 in translate.py. Should it be changed to "model.saver.restore(session, ckpt_name)" so that each time it can return the loaded iteration instead of the last checkpoint, or indeed I feed in a wrong argument?

Start and end joint

Why the start and end joins in viz don't covert all 32 joints?

self.I   = np.array([1,2,3,1,7,8,1, 13,14,15,14,18,19,14,26,27])-1
self.J   = np.array([2,3,4,7,8,9,13,14,15,16,18,19,20,26,27,28])-1

Encoder inputs are less than source_seq_len by 1

Hi @una-dinosauria
I'm wondering why did you choose encoder_inputs to be less than source_seq_len by 1?
Line 453
While you choose decoder_outputs and decoder_inputs to be the same as target_seq_len.

human-motion-prediction/src/seq2seq_model.py

Lines 435 to 455 in c9a2774

    
             def get_batch( self, data, actions ): 
        
               """Get a random batch of data from the specified bucket, prepare for step. 
        
               Args 
        
                 data: a list of sequences of size n-by-d to fit the model to. 
        
                 actions: a list of the actions we are using 
        
               Returns 
        
                 The tuple (encoder_inputs, decoder_inputs, decoder_outputs); 
        
                 the constructed batches have the proper format to call step(...) later. 
        
               """ 
        
               # Select entries at random 
        
               all_keys    = list(data.keys()) 
        
               chosen_keys = np.random.choice( len(all_keys), self.batch_size ) 
        
               # How many frames in total do we need? 
        
               total_frames = self.source_seq_len + self.target_seq_len 
        
               encoder_inputs  = np.zeros((self.batch_size, self.source_seq_len-1, self.input_size), dtype=float) 
        
               decoder_inputs  = np.zeros((self.batch_size, self.target_seq_len, self.input_size), dtype=float) 
        
               decoder_outputs = np.zeros((self.batch_size, self.target_seq_len, self.input_size), dtype=float)

TF Saver has issues on Windows

Hi there,

Quick post below that I'll update later when I have some more time;

Thanks for this paper and for sharing your code. I'm trying to replicate your results on Windows 10, and the TensorFlow Saver class that saves the model as it is training seems to have an issue. Either the path name or the file name of the files is far too long for Windows or NTFS (I haven't determined which yet). To help me debug this, can you let me know what operating system and file system you were running this code on?

Thank you, I'll share more info later.

Calculating residual velocity

Hi Una-dinosauria,

According to my understanding to eliminate the first frame discontinuity, residual velocity should be calculated(Please correct me if I am wrong). But in rnn_cell_extensions.py file, I cannot find the place where the residual velocity is calculated. I can only see the code which is related to the residual connection. Can you please point me to the correct place where the residual velocity is calculated.

Thank you,
Kavindu

Human3.6M D3 angles

Hi,

Somewhere in your paper, you mentioned that H3.6M has 54 independent joint angles in 3D space. Are these correspond to 18 joints? If possible, I really appreciate if you could please give me a link to the data that contains these 54 joint angles (with their corresponding joint index) per sequence.

I was looking at H3.6M both on its website and the link you've mentioned in your repo. The values are different in different sources. Moreover, in the dataset you've used, there are 99 values per frame, and you select 54 of them. However, I'm not sure what are these 54 values per frame. Are they correspond to 18 joints in 3D angle space? if yes, would you please let me know the triplet (x, y, and z) indices correspond to each joint?

Thanks

pytorch version

Hi!

I implemented a pytorch version of the code in this fork. The results are similar for the demos I tried so far.

TypeError: 'dict_keys' object does not support indexing when trying to train.

Hello,

I faced a type error after trying to run the train demo that was found on readme.md.
I've already installed tensorflow and h5py.
Here's the traceback error when running the training.

`File "translate.py", line 700, in
tf.app.run()

File "/home/lenovo-4/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))

File "translate.py", line 697, in main
train()

File "translate.py", line 153, in train
encoder_inputs, decoder_inputs, decoder_outputs = model.get_batch( train_set, not FLAGS.omit_one_hot )

File "/home/lenovo-4/Shiangyoung/human-motion-prediction-master/seq2seq_model.py", line 462, in get_batch
the_key = all_keys[ chosen_keys[i] ]
TypeError: 'dict_keys' object does not support indexing
`

Would be great if anyone could help out.

Thank you.

Edit : Changed all_keys = list(data.keys()) at line 450 in seq2seq_model.py

Set translation and global rotation zero

This data is normalised by set global rotation and translation o n XY plane zero right ? But which parts of code do that?

Doubt on quantitative result table.

Hi,

Firstly, I want to congratulate you for this paper being cited for multiple times recently.

However, I still feel confused about some details. I double check your paper where you successfully reproduced previous works (SRNN) in CVPR2016 and published results of them:

Is it the result that you optimized these models over 400ms or 1000ms? I guess that result above is one being optimized over 400ms, the one that I traced through here github with source code is optimized over 1000ms, which means model predicts for long term. I also attached your posted result here:

Can you help out here?

Best regards,
CHELSEA234

Small bug in the code

Hello,
Thank you very much for code of your paper (paper was quite good also). I think there is a small bug in your code. while computing error you are checking the estimated value std, and selecting the dimensions to use. I think this is not the right way selecting dimensions( When your model estimate quite low number you can still get good results). Also referenced code finding dimensions to use as I said.

Bug line

PS: I fixed the code, and tried to run model, it seems it is still converging quite good :)
Have a nice time in Havai

About fkl function

Hello Julieta,
I am a graduate student who is interested in motion prediction.
I have questions about fkl function of forward_kinematics.py.

The dimension of the angles received by the fkl function is 99, the first dim 1,2 and 3 represents the translation of the root joint, the dim 4, 5 and 6 represent the Exp.Coord of the root joint, dim 7~99 represent the Exp.coord of the remaining joints. Is it correct?
I do not know what rotInd and expmapInd represent. Looking at line 48 of forward_kinematics.py, position is updated with thisPosition = np.array ([xangle, yangle, zangle]), where xangle, yangle, and zangle are Exp.coord. xangle, yangle, and zangle are Exp.coord, why does this represent position?
Line 52 does not make sense to update xyz by adding offset (bone length).
The 54,55 line forward kinematics part is not well understood. I would be very grateful if you could explain it.

Slight unnecessary hardcode

human-motion-prediction/src/seq2seq_model.py

Lines 545 to 558 in c9a2774

    
           # Reproducing SRNN's sequence subsequence selection as done in 
        
           # https://github.com/asheshjain399/RNNexp/blob/master/structural_rnn/CRFProblems/H3.6m/processdata.py#L343 
        
           for i in xrange( batch_size ): 
        
             _, subsequence, idx = seeds[i] 
        
             idx = idx + 50 
        
             data_sel = data[ (subject, action, subsequence, 'even') ] 
        
             data_sel = data_sel[(idx-source_seq_len):(idx+target_seq_len) ,:] 
        
             encoder_inputs[i, :, :]  = data_sel[0:source_seq_len-1, :] 
        
             decoder_inputs[i, :, :]  = data_sel[source_seq_len-1:(source_seq_len+target_seq_len-1), :] 
        
             decoder_outputs[i, :, :] = data_sel[source_seq_len:, :]

In the above, at line 550, I believe the 50 is actually a source_seq_len?

	xangle = angles[ rotInd[i][0]-1 ]
	yangle = angles[ rotInd[i][1]-1 ]
	zangle = angles[ rotInd[i][2]-1 ]

	# Sample somewherein the middle
	idx = np.random.randint( 16, n-total_frames )

	def get_batch( self, data, actions ):
	"""Get a random batch of data from the specified bucket, prepare for step.

	Args
	data: a list of sequences of size n-by-d to fit the model to.
	actions: a list of the actions we are using
	Returns
	The tuple (encoder_inputs, decoder_inputs, decoder_outputs);
	the constructed batches have the proper format to call step(...) later.
	"""

	# Select entries at random
	all_keys = list(data.keys())
	chosen_keys = np.random.choice( len(all_keys), self.batch_size )

	# How many frames in total do we need?
	total_frames = self.source_seq_len + self.target_seq_len

	encoder_inputs = np.zeros((self.batch_size, self.source_seq_len-1, self.input_size), dtype=float)
	decoder_inputs = np.zeros((self.batch_size, self.target_seq_len, self.input_size), dtype=float)
	decoder_outputs = np.zeros((self.batch_size, self.target_seq_len, self.input_size), dtype=float)

	# Reproducing SRNN's sequence subsequence selection as done in
	# https://github.com/asheshjain399/RNNexp/blob/master/structural_rnn/CRFProblems/H3.6m/processdata.py#L343
	for i in xrange( batch_size ):

	_, subsequence, idx = seeds[i]
	idx = idx + 50

	data_sel = data[ (subject, action, subsequence, 'even') ]

	data_sel = data_sel[(idx-source_seq_len):(idx+target_seq_len) ,:]

	encoder_inputs[i, :, :] = data_sel[0:source_seq_len-1, :]
	decoder_inputs[i, :, :] = data_sel[source_seq_len-1:(source_seq_len+target_seq_len-1), :]
	decoder_outputs[i, :, :] = data_sel[source_seq_len:, :]

una-dinosauria / human-motion-prediction Goto Github PK

human-motion-prediction's Introduction

human-motion-prediction

Dependencies

Get this code and the data

Quick demo and visualization

Running average baselines

RNN models

Citing

Other implementations

Acknowledgments

Licence

human-motion-prediction's People

Contributors

Stargazers

Watchers

Forkers

human-motion-prediction's Issues

===================================== UPDATE

Recommend Projects

Recommend Topics

Recommend Org

=====================================
UPDATE