Code Monkey home page Code Monkey logo

Comments (10)

Teravus avatar Teravus commented on August 19, 2024 2

At least for me, my problem was number of frames, given that they were all loaded in vram. I solved it with some trivial code changes that batched the frames to inference.. which worked so long as the batch size is evenly divisible by neighbor_stride.
In this way, results do not suffer and VRAM is reduced. Is this something that you would like as a pull request @Paper99 ? (note for myself, if you say yes. This edit is on my D drive under the E2FGVI folder)

Pretty much zero experience in this rodeo @Teravus, but any chance you could share specifically how you set up the batching? I'm definitely hitting that same limit on my own samples. Thanks!

Here's my Test.py in the zip.
test.zip

The main thing is.. create a stride.. then loop over the stride.. putting only the frames for this stride in VRAM at a time.
Then, next stride, replace those images in VRAM with new images.

That way they're not all in VRAM at once.

With the frames and masks;

    x_frames = [rframes[i:i + framestride] for i in range(0, len(rframes), framestride)]
    x_masks = [rmasks[i:i + framestride] for i in range(0, len(rmasks), framestride)]

Look for the line that says; framestride = 200.

You can change that to more or less depending on your VRAM. If you lower it, the process will take less VRAM. If you raise it. The process will take more VRAM.

A Couple of things to note:

  • The framestride must be divisible evenly by the neighbor_stride. The neighbor_stride is 5 by default but you can change it on the command line. By Default: Any number that you can divide by 5.
  • When you're processing, it treats each stride worth of images as its own process. So you'll see the statusbar go from the start to end multiple times depending on your framestride. Let's say that you have 750 frames. If you set the framestride to 200, you'll see it go from 0 to full 4 times. The last stride will be less.
  • Across all of the strides, It outputs the results of each frame into comp_frames, which then gets turned into your video.
  • I didn't update the little video player UI that comes up after the video is complete. It will display only part of your completed video. but your output video file will be the complete video.
  • I've only tested this change when you are using a directory of individual frames and have not tested it on a video input file directly. It's possible it might work, but I have not tested it. I use ffmpeg to extract the frames.

Hopefully this is helpful.

from e2fgvi.

Paper99 avatar Paper99 commented on August 19, 2024

Could you tell me the spatial resolution of your video?

from e2fgvi.

tchen0623 avatar tchen0623 commented on August 19, 2024

Hi, it's 720p. And I am able to run my own video by changing the setting of --step in test.py
But the result is terrible.

from e2fgvi.

Paper99 avatar Paper99 commented on August 19, 2024

We use a GPU with 48G memory to process 720p video.
The setting of --step does affect the performance.
To check the influence of this parameter, you could first keep the original settings and use e2fgvi to process the downscaled video.

from e2fgvi.

Teravus avatar Teravus commented on August 19, 2024

At least for me, my problem was number of frames, given that they were all loaded in vram. I solved it with some trivial code changes that batched the frames to inference.. which worked so long as the batch size is evenly divisible by neighbor_stride.

In this way, results do not suffer and VRAM is reduced.
Is this something that you would like as a pull request @Paper99 ?
(note for myself, if you say yes. This edit is on my D drive under the E2FGVI folder)

from e2fgvi.

firebeasty avatar firebeasty commented on August 19, 2024

At least for me, my problem was number of frames, given that they were all loaded in vram. I solved it with some trivial code changes that batched the frames to inference.. which worked so long as the batch size is evenly divisible by neighbor_stride.

In this way, results do not suffer and VRAM is reduced. Is this something that you would like as a pull request @Paper99 ? (note for myself, if you say yes. This edit is on my D drive under the E2FGVI folder)

Pretty much zero experience in this rodeo @Teravus, but any chance you could share specifically how you set up the batching? I'm definitely hitting that same limit on my own samples. Thanks!

from e2fgvi.

firebeasty avatar firebeasty commented on August 19, 2024

At least for me, my problem was number of frames, given that they were all loaded in vram. I solved it with some trivial code changes that batched the frames to inference.. which worked so long as the batch size is evenly divisible by neighbor_stride.
In this way, results do not suffer and VRAM is reduced. Is this something that you would like as a pull request @Paper99 ? (note for myself, if you say yes. This edit is on my D drive under the E2FGVI folder)

Pretty much zero experience in this rodeo @Teravus, but any chance you could share specifically how you set up the batching? I'm definitely hitting that same limit on my own samples. Thanks!

Here's my Test.py in the zip. test.zip

The main thing is.. create a stride.. then loop over the stride.. putting only the frames for this stride in VRAM at a time. Then, next stride, replace those images in VRAM with new images.

That way they're not all in VRAM at once.

With the frames and masks;

    x_frames = [rframes[i:i + framestride] for i in range(0, len(rframes), framestride)]
    x_masks = [rmasks[i:i + framestride] for i in range(0, len(rmasks), framestride)]

Look for the line that says; framestride = 200.

You can change that to more or less depending on your VRAM. If you lower it, the process will take less VRAM. If you raise it. The process will take more VRAM.

A Couple of things to note:

  • The framestride must be divisible evenly by the neighbor_stride. The neighbor_stride is 5 by default but you can change it on the command line. By Default: Any number that you can divide by 5.
  • When you're processing, it treats each stride worth of images as its own process. So you'll see the statusbar go from the start to end multiple times depending on your framestride. Let's say that you have 750 frames. If you set the framestride to 200, you'll see it go from 0 to full 4 times. The last stride will be less.
  • Across all of the strides, It outputs the results of each frame into comp_frames, which then gets turned into your video.
  • I didn't update the little video player UI that comes up after the video is complete. It will display only part of your completed video. but your output video file will be the complete video.
  • I've only tested this change when you are using a directory of individual frames and have not tested it on a video input file directly. It's possible it might work, but I have not tested it. I use ffmpeg to extract the frames.

Hopefully this is helpful.

This was really helpful! I got it working great! I had to debug your cuda:1 line to cuda since I only have 1 gpu, but otherwise, worked great with some minor parameter tweaking.

I am curious if there's a way to have it temporally link up between batches - I do notice some minor pops where the strides are broken up. Is there some way you could maintain the data from the last frame before going to the next stride so it has some temporal consistency between them? -- I understand completely that that's a huge wish item, but I have to ask for selfish reasons! 😅 Again, thank you so much for getting me this far! Your fix in the other thread on setting up the windows environment from 3 weeks ago was insanely helpful after failing at getting this going a month ago!

from e2fgvi.

Teravus avatar Teravus commented on August 19, 2024

At least for me, my problem was number of frames, given that they were all loaded in vram. I solved it with some trivial code changes that batched the frames to inference.. which worked so long as the batch size is evenly divisible by neighbor_stride.
In this way, results do not suffer and VRAM is reduced. Is this something that you would like as a pull request @Paper99 ? (note for myself, if you say yes. This edit is on my D drive under the E2FGVI folder)

Pretty much zero experience in this rodeo @Teravus, but any chance you could share specifically how you set up the batching? I'm definitely hitting that same limit on my own samples. Thanks!

Here's my Test.py in the zip. test.zip
The main thing is.. create a stride.. then loop over the stride.. putting only the frames for this stride in VRAM at a time. Then, next stride, replace those images in VRAM with new images.
That way they're not all in VRAM at once.
With the frames and masks;

    x_frames = [rframes[i:i + framestride] for i in range(0, len(rframes), framestride)]
    x_masks = [rmasks[i:i + framestride] for i in range(0, len(rmasks), framestride)]

Look for the line that says; framestride = 200.
You can change that to more or less depending on your VRAM. If you lower it, the process will take less VRAM. If you raise it. The process will take more VRAM.
A Couple of things to note:

  • The framestride must be divisible evenly by the neighbor_stride. The neighbor_stride is 5 by default but you can change it on the command line. By Default: Any number that you can divide by 5.
  • When you're processing, it treats each stride worth of images as its own process. So you'll see the statusbar go from the start to end multiple times depending on your framestride. Let's say that you have 750 frames. If you set the framestride to 200, you'll see it go from 0 to full 4 times. The last stride will be less.
  • Across all of the strides, It outputs the results of each frame into comp_frames, which then gets turned into your video.
  • I didn't update the little video player UI that comes up after the video is complete. It will display only part of your completed video. but your output video file will be the complete video.
  • I've only tested this change when you are using a directory of individual frames and have not tested it on a video input file directly. It's possible it might work, but I have not tested it. I use ffmpeg to extract the frames.

Hopefully this is helpful.

This was really helpful! I got it working great! I had to debug your cuda:1 line to cuda since I only have 1 gpu, but otherwise, worked great with some minor parameter tweaking.

I am curious if there's a way to have it temporally link up between batches - I do notice some minor pops where the strides are broken up. Is there some way you could maintain the data from the last frame before going to the next stride so it has some temporal consistency between them? -- I understand completely that that's a huge wish item, but I have to ask for selfish reasons! 😅 Again, thank you so much for getting me this far! Your fix in the other thread on setting up the windows environment from 3 weeks ago was insanely helpful after failing at getting this going a month ago!

Glad it was helpful. Sorry about the cuda:1 thing. I have two GPUs and sent everything to the GPU that I wasn't already using for something.

To answer your question about coherency between batches.. Probably yes. However, we'd need to write a special neighbor routine for [neighbor_stride] frames at the end and beginning of batches..

In my test video that I'm using, I didn't see any flashes but it's a very limited use case... with a mask that stays roughly the same for each frame. I bet if I clone one of the sample videos and reverse it, then append it to the original... then I'd see flashes also.

from e2fgvi.

firebeasty avatar firebeasty commented on August 19, 2024

from e2fgvi.

Teravus avatar Teravus commented on August 19, 2024

@firebeasty I've been playing with the major parameters... and came to the conclusion that a higher neighbor_stride works better on a smaller number of frames.

If you're going to use 24 frame long strides, try --neighbor_stride 10 instead of the default neighbor_stride 5. I'd also set the strides to 20 instead of 24 so it is a multiple of neighbor_stride.
Try --neighbor_stride 10 via the command line
and framestride 20 in the code.

This seems to reduce the skipping/flaring for me.

from e2fgvi.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.