Code Monkey home page Code Monkey logo

Comments (5)

alexlee-gk avatar alexlee-gk commented on July 22, 2024

Hi Julia, thanks for your interest! Here are some answers to your questions.

  1. context_frames refers to the number of past frames that you are conditioning on to predict the future. For example, if you condition on the past 2 frames and predict the next 10 frames (as it's the case for the robot experiments in the paper), that would be context_frames=2 and sequence_length=12.
  2. sequence_length refers to the video length used for training. Since the actual length of each KTH video is greater than 20, a subsequence of length 20 is randomly sampled from the video. The subsequences are resampled (from a uniform i.i.d. distribution) at each epoch. Also, this length is not a parameter that we tune, but rather it's a setting that was chosen to match previous experimental conditions in papers from prior work.
  3. No, you don't need to resize the frames. You can train a model at your original resolution with one minor code change: add another condition in here to specify the layers of the main part of the generator. Here is an example. The only constraint is that the spatial dimensions should be a multiple of 2 ** i, where i is the number of downsample and upsample layers.
    If you are using images with resolutions higher than 64x64, I'd recommend you to use a fully-convolutional network, flow vectors instead of CDNA/DNA kernels, among other options to improve memory usage. You can do all of that by overriding some of the model hparams. I have found that these model hparams work better than the ones we used in the paper:
    transformation=flow,tv_weight=0.001,downsample_layer=conv2d,upsample_layer=deconv2d,where_add=middle.

from video_prediction.

SummerHelen avatar SummerHelen commented on July 22, 2024

Hi Alex,

Thanks for your prompt reply. My each video clip includes 100 frames and I resized them to 256 * 256 because I found here has this option. I leave the context_frames and sequence_length unchanged. When I was training the model. After the process showed "parameter_count = 63749227", my computer reacts very slow, and it was almost freezing. Is it normal? Or is it because I should fine tune the model_hparam? I am currently using kth's model_hparams.

Then I followed your suggestion. I added "transformation=flow,tv_weight=0.001,downsample_layer=conv2d,upsample_layer=deconv2d,where_add=middle" to model_hparams.json file. When I was training the model, it pop out warning saying "Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory". I guess I might mess something up here. Please give more advice on the situation.

Thanks.

Best,
Julia

from video_prediction.

alexlee-gk avatar alexlee-gk commented on July 22, 2024

Hi Julia,

It's normal for it to be slow at the beginning since the model is being compiled and that can take some time for large models. The first sess.run shouldn't be more than a few minutes though. Also, make sure that you are using the GPU.

Those model hparams are good, especially the transformation=flow part if you are using larger images. I also get that warning message regardless and I haven't found that to impact the performance in my case. You can safely ignore it.

Best,
Alex

from video_prediction.

shijiwensjw avatar shijiwensjw commented on July 22, 2024

Hi Alex,

I have read your paper and seen your dicussion above. And I get a question, can I use your pretrained models to generate forecasted videos with my own dataset? For example, my own data has different robot arm or different objects compare to yours.

So, the question is just if these models in your paper depend on the train data?

Best,
Steven

from video_prediction.

alexlee-gk avatar alexlee-gk commented on July 22, 2024

Hi Steven,

If you use the action-conditioned models, you'll run into an error since the dimensions of the states/actions might differ between both datasets (and even if they don't, they might have different semantics). However, the action-free models should run with your data, but I'm not sure how well it'd generalize. I expect different objects to generalize (because there was multiple objects in the training set) but I don't expect that for a different robot arm.

Best,
Alex

from video_prediction.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.