Hi Alex, Thanks for releasing all the code. I am working on fitting

Tuning the hparams when processing the dataset about video_prediction HOT 5 CLOSED

alexlee-gk commented on July 22, 2024

Tuning the hparams when processing the dataset

from video_prediction.

Comments (5)

alexlee-gk commented on July 22, 2024

Hi Julia, thanks for your interest! Here are some answers to your questions.

context_frames refers to the number of past frames that you are conditioning on to predict the future. For example, if you condition on the past 2 frames and predict the next 10 frames (as it's the case for the robot experiments in the paper), that would be context_frames=2 and sequence_length=12.
sequence_length refers to the video length used for training. Since the actual length of each KTH video is greater than 20, a subsequence of length 20 is randomly sampled from the video. The subsequences are resampled (from a uniform i.i.d. distribution) at each epoch. Also, this length is not a parameter that we tune, but rather it's a setting that was chosen to match previous experimental conditions in papers from prior work.
No, you don't need to resize the frames. You can train a model at your original resolution with one minor code change: add another condition in here to specify the layers of the main part of the generator. Here is an example. The only constraint is that the spatial dimensions should be a multiple of 2 ** i, where i is the number of downsample and upsample layers.
If you are using images with resolutions higher than 64x64, I'd recommend you to use a fully-convolutional network, flow vectors instead of CDNA/DNA kernels, among other options to improve memory usage. You can do all of that by overriding some of the model hparams. I have found that these model hparams work better than the ones we used in the paper:
transformation=flow,tv_weight=0.001,downsample_layer=conv2d,upsample_layer=deconv2d,where_add=middle.

from video_prediction.

SummerHelen commented on July 22, 2024

Hi Alex,

Thanks for your prompt reply. My each video clip includes 100 frames and I resized them to 256 * 256 because I found here has this option. I leave the context_frames and sequence_length unchanged. When I was training the model. After the process showed "parameter_count = 63749227", my computer reacts very slow, and it was almost freezing. Is it normal? Or is it because I should fine tune the model_hparam? I am currently using kth's model_hparams.

Then I followed your suggestion. I added "transformation=flow,tv_weight=0.001,downsample_layer=conv2d,upsample_layer=deconv2d,where_add=middle" to model_hparams.json file. When I was training the model, it pop out warning saying "Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory". I guess I might mess something up here. Please give more advice on the situation.

Thanks.

Best,
Julia

from video_prediction.

alexlee-gk commented on July 22, 2024

Hi Julia,

It's normal for it to be slow at the beginning since the model is being compiled and that can take some time for large models. The first sess.run shouldn't be more than a few minutes though. Also, make sure that you are using the GPU.

Those model hparams are good, especially the transformation=flow part if you are using larger images. I also get that warning message regardless and I haven't found that to impact the performance in my case. You can safely ignore it.

Best,
Alex

from video_prediction.

shijiwensjw commented on July 22, 2024

Hi Alex,

I have read your paper and seen your dicussion above. And I get a question, can I use your pretrained models to generate forecasted videos with my own dataset? For example, my own data has different robot arm or different objects compare to yours.

So, the question is just if these models in your paper depend on the train data?

Best,
Steven

from video_prediction.

alexlee-gk commented on July 22, 2024

Hi Steven,

If you use the action-conditioned models, you'll run into an error since the dimensions of the states/actions might differ between both datasets (and even if they don't, they might have different semantics). However, the action-free models should run with your data, but I'm not sure how well it'd generalize. I expect different objects to generalize (because there was multiple objects in the training set) but I don't expect that for a different robot arm.

Best,
Alex

from video_prediction.

Tuning the hparams when processing the dataset about video_prediction HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent