When I was trying to run my own video, it meet the problem of memory. <p dir="auto

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi about the memory error about e2fgvi HOT 10 OPEN

mcg-nku commented on August 19, 2024

Hi about the memory error

from e2fgvi.

Comments (10)

Teravus commented on August 19, 2024 2

At least for me, my problem was number of frames, given that they were all loaded in vram. I solved it with some trivial code changes that batched the frames to inference.. which worked so long as the batch size is evenly divisible by neighbor_stride.
In this way, results do not suffer and VRAM is reduced. Is this something that you would like as a pull request @Paper99 ? (note for myself, if you say yes. This edit is on my D drive under the E2FGVI folder)

Pretty much zero experience in this rodeo @Teravus, but any chance you could share specifically how you set up the batching? I'm definitely hitting that same limit on my own samples. Thanks!

Here's my Test.py in the zip.
test.zip

The main thing is.. create a stride.. then loop over the stride.. putting only the frames for this stride in VRAM at a time.
Then, next stride, replace those images in VRAM with new images.

That way they're not all in VRAM at once.

With the frames and masks;

    x_frames = [rframes[i:i + framestride] for i in range(0, len(rframes), framestride)]
    x_masks = [rmasks[i:i + framestride] for i in range(0, len(rmasks), framestride)]

Look for the line that says; framestride = 200.

You can change that to more or less depending on your VRAM. If you lower it, the process will take less VRAM. If you raise it. The process will take more VRAM.

A Couple of things to note:

The framestride must be divisible evenly by the neighbor_stride. The neighbor_stride is 5 by default but you can change it on the command line. By Default: Any number that you can divide by 5.
When you're processing, it treats each stride worth of images as its own process. So you'll see the statusbar go from the start to end multiple times depending on your framestride. Let's say that you have 750 frames. If you set the framestride to 200, you'll see it go from 0 to full 4 times. The last stride will be less.
Across all of the strides, It outputs the results of each frame into comp_frames, which then gets turned into your video.
I didn't update the little video player UI that comes up after the video is complete. It will display only part of your completed video. but your output video file will be the complete video.
I've only tested this change when you are using a directory of individual frames and have not tested it on a video input file directly. It's possible it might work, but I have not tested it. I use ffmpeg to extract the frames.

Hopefully this is helpful.

from e2fgvi.

Paper99 commented on August 19, 2024

Could you tell me the spatial resolution of your video？

from e2fgvi.

tchen0623 commented on August 19, 2024

Hi, it's 720p. And I am able to run my own video by changing the setting of --step in test.py
But the result is terrible.

from e2fgvi.

Paper99 commented on August 19, 2024

We use a GPU with 48G memory to process 720p video.
The setting of --step does affect the performance.
To check the influence of this parameter, you could first keep the original settings and use e2fgvi to process the downscaled video.

from e2fgvi.

Teravus commented on August 19, 2024

At least for me, my problem was number of frames, given that they were all loaded in vram. I solved it with some trivial code changes that batched the frames to inference.. which worked so long as the batch size is evenly divisible by neighbor_stride.

In this way, results do not suffer and VRAM is reduced.
Is this something that you would like as a pull request @Paper99 ?
(note for myself, if you say yes. This edit is on my D drive under the E2FGVI folder)

from e2fgvi.

firebeasty commented on August 19, 2024

At least for me, my problem was number of frames, given that they were all loaded in vram. I solved it with some trivial code changes that batched the frames to inference.. which worked so long as the batch size is evenly divisible by neighbor_stride.

In this way, results do not suffer and VRAM is reduced. Is this something that you would like as a pull request @Paper99 ? (note for myself, if you say yes. This edit is on my D drive under the E2FGVI folder)

Pretty much zero experience in this rodeo @Teravus, but any chance you could share specifically how you set up the batching? I'm definitely hitting that same limit on my own samples. Thanks!

from e2fgvi.

firebeasty commented on August 19, 2024

At least for me, my problem was number of frames, given that they were all loaded in vram. I solved it with some trivial code changes that batched the frames to inference.. which worked so long as the batch size is evenly divisible by neighbor_stride.
In this way, results do not suffer and VRAM is reduced. Is this something that you would like as a pull request @Paper99 ? (note for myself, if you say yes. This edit is on my D drive under the E2FGVI folder)

Pretty much zero experience in this rodeo @Teravus, but any chance you could share specifically how you set up the batching? I'm definitely hitting that same limit on my own samples. Thanks!

Here's my Test.py in the zip. test.zip

The main thing is.. create a stride.. then loop over the stride.. putting only the frames for this stride in VRAM at a time. Then, next stride, replace those images in VRAM with new images.

That way they're not all in VRAM at once.

With the frames and masks;
    x_frames = [rframes[i:i + framestride] for i in range(0, len(rframes), framestride)]
    x_masks = [rmasks[i:i + framestride] for i in range(0, len(rmasks), framestride)]
Look for the line that says; framestride = 200.

You can change that to more or less depending on your VRAM. If you lower it, the process will take less VRAM. If you raise it. The process will take more VRAM.

A Couple of things to note:

The framestride must be divisible evenly by the neighbor_stride. The neighbor_stride is 5 by default but you can change it on the command line. By Default: Any number that you can divide by 5.

When you're processing, it treats each stride worth of images as its own process. So you'll see the statusbar go from the start to end multiple times depending on your framestride. Let's say that you have 750 frames. If you set the framestride to 200, you'll see it go from 0 to full 4 times. The last stride will be less.

Across all of the strides, It outputs the results of each frame into comp_frames, which then gets turned into your video.

I didn't update the little video player UI that comes up after the video is complete. It will display only part of your completed video. but your output video file will be the complete video.

I've only tested this change when you are using a directory of individual frames and have not tested it on a video input file directly. It's possible it might work, but I have not tested it. I use ffmpeg to extract the frames.

Hopefully this is helpful.

This was really helpful! I got it working great! I had to debug your cuda:1 line to cuda since I only have 1 gpu, but otherwise, worked great with some minor parameter tweaking.

I am curious if there's a way to have it temporally link up between batches - I do notice some minor pops where the strides are broken up. Is there some way you could maintain the data from the last frame before going to the next stride so it has some temporal consistency between them? -- I understand completely that that's a huge wish item, but I have to ask for selfish reasons! 😅 Again, thank you so much for getting me this far! Your fix in the other thread on setting up the windows environment from 3 weeks ago was insanely helpful after failing at getting this going a month ago!

from e2fgvi.

Teravus commented on August 19, 2024

At least for me, my problem was number of frames, given that they were all loaded in vram. I solved it with some trivial code changes that batched the frames to inference.. which worked so long as the batch size is evenly divisible by neighbor_stride.
In this way, results do not suffer and VRAM is reduced. Is this something that you would like as a pull request @Paper99 ? (note for myself, if you say yes. This edit is on my D drive under the E2FGVI folder)

Pretty much zero experience in this rodeo @Teravus, but any chance you could share specifically how you set up the batching? I'm definitely hitting that same limit on my own samples. Thanks!

Here's my Test.py in the zip. test.zip
The main thing is.. create a stride.. then loop over the stride.. putting only the frames for this stride in VRAM at a time. Then, next stride, replace those images in VRAM with new images.
That way they're not all in VRAM at once.
With the frames and masks;
    x_frames = [rframes[i:i + framestride] for i in range(0, len(rframes), framestride)]
    x_masks = [rmasks[i:i + framestride] for i in range(0, len(rmasks), framestride)]
Look for the line that says; framestride = 200.
You can change that to more or less depending on your VRAM. If you lower it, the process will take less VRAM. If you raise it. The process will take more VRAM.
A Couple of things to note:

The framestride must be divisible evenly by the neighbor_stride. The neighbor_stride is 5 by default but you can change it on the command line. By Default: Any number that you can divide by 5.

When you're processing, it treats each stride worth of images as its own process. So you'll see the statusbar go from the start to end multiple times depending on your framestride. Let's say that you have 750 frames. If you set the framestride to 200, you'll see it go from 0 to full 4 times. The last stride will be less.

Across all of the strides, It outputs the results of each frame into comp_frames, which then gets turned into your video.

I didn't update the little video player UI that comes up after the video is complete. It will display only part of your completed video. but your output video file will be the complete video.

I've only tested this change when you are using a directory of individual frames and have not tested it on a video input file directly. It's possible it might work, but I have not tested it. I use ffmpeg to extract the frames.

Hopefully this is helpful.
This was really helpful! I got it working great! I had to debug your cuda:1 line to cuda since I only have 1 gpu, but otherwise, worked great with some minor parameter tweaking.

I am curious if there's a way to have it temporally link up between batches - I do notice some minor pops where the strides are broken up. Is there some way you could maintain the data from the last frame before going to the next stride so it has some temporal consistency between them? -- I understand completely that that's a huge wish item, but I have to ask for selfish reasons! 😅 Again, thank you so much for getting me this far! Your fix in the other thread on setting up the windows environment from 3 weeks ago was insanely helpful after failing at getting this going a month ago!

Glad it was helpful. Sorry about the cuda:1 thing. I have two GPUs and sent everything to the GPU that I wasn't already using for something.

To answer your question about coherency between batches.. Probably yes. However, we'd need to write a special neighbor routine for [neighbor_stride] frames at the end and beginning of batches..

In my test video that I'm using, I didn't see any flashes but it's a very limited use case... with a mask that stays roughly the same for each frame. I bet if I clone one of the sample videos and reverse it, then append it to the original... then I'd see flashes also.

from e2fgvi.

firebeasty commented on August 19, 2024

I'm probably pushing the use case of this really far - without saying too much I'm trying to remove tracking marks on a face that's pretty smooth with little motion. The pops between strides becomes pretty noticeable for areas of sharp contrast. It's almost not noticeable! But still very much so! I only have 12GB card too so pushing in 960x480 frames, I'm lucky to get 24-frame-long strides. A pop every 5-10 seconds wouldn't be that bad, but every second is something else! 😅 Ideally I'd be able to push a little bit more resolution in so skin texture can be preserved, but I'm still quite impressed for the first go at it - trying to think how to limit stride size while maintaining temporal consistency. Again, basically zero experience in this - thanks so much for entertaining my questions!

…

On Wed, Jan 4, 2023 at 5:07 PM Teravus ***@***.***> wrote: At least for me, my problem was number of frames, given that they were all loaded in vram. I solved it with some trivial code changes that batched the frames to inference.. which worked so long as the batch size is evenly divisible by neighbor_stride. In this way, results do not suffer and VRAM is reduced. Is this something that you would like as a pull request @Paper99 <https://github.com/Paper99> ? (note for myself, if you say yes. This edit is on my D drive under the E2FGVI folder) Pretty much zero experience in this rodeo @Teravus <https://github.com/Teravus>, but any chance you could share specifically how you set up the batching? I'm definitely hitting that same limit on my own samples. Thanks! Here's my Test.py in the zip. test.zip <https://github.com/MCG-NKU/E2FGVI/files/10340822/test.zip> The main thing is.. create a stride.. then loop over the stride.. putting only the frames for this stride in VRAM at a time. Then, next stride, replace those images in VRAM with new images. That way they're not all in VRAM at once. With the frames and masks; x_frames = [rframes[i:i + framestride] for i in range(0, len(rframes), framestride)] x_masks = [rmasks[i:i + framestride] for i in range(0, len(rmasks), framestride)] Look for the line that says; framestride = 200. You can change that to more or less depending on your VRAM. If you lower it, the process will take less VRAM. If you raise it. The process will take more VRAM. A Couple of things to note: - The framestride must be divisible evenly by the neighbor_stride. The neighbor_stride is 5 by default but you can change it on the command line. By Default: Any number that you can divide by 5. - When you're processing, it treats each stride worth of images as its own process. So you'll see the statusbar go from the start to end multiple times depending on your framestride. Let's say that you have 750 frames. If you set the framestride to 200, you'll see it go from 0 to full 4 times. The last stride will be less. - Across all of the strides, It outputs the results of each frame into comp_frames, which then gets turned into your video. - I didn't update the little video player UI that comes up after the video is complete. It will display only part of your completed video. but your output video file will be the complete video. - I've only tested this change when you are using a directory of individual frames and have not tested it on a video input file directly. It's possible it might work, but I have not tested it. I use ffmpeg to extract the frames. Hopefully this is helpful. This was really helpful! I got it working great! I had to debug your cuda:1 line to cuda since I only have 1 gpu, but otherwise, worked great with some minor parameter tweaking. I am curious if there's a way to have it temporally link up between batches - I do notice some minor pops where the strides are broken up. Is there some way you could maintain the data from the last frame before going to the next stride so it has some temporal consistency between them? -- I understand completely that that's a huge wish item, but I have to ask for selfish reasons! 😅 Again, thank you so much for getting me this far! Your fix in the other thread on setting up the windows environment from 3 weeks ago was insanely helpful after failing at getting this going a month ago! Glad it was helpful. Sorry about the cuda:1 thing. I have two GPUs and sent everything to the GPU that I wasn't already using for something. To answer your question about coherency between batches.. Probably yes. However, we'd need to write a special neighbor routine for [neighbor_stride] frames at the end and beginning of batches.. In my test video that I'm using, I didn't see any flashes but it's a very limited use case... with a mask that stays roughly the same for each frame. I bet if I clone one of the sample videos and reverse it, then append it to the original... then I'd see flashes also. — Reply to this email directly, view it on GitHub <#29 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAVHEC7AZZQ77IWAOFBZ4ALWQYGFLANCNFSM56ONIJKQ> . You are receiving this because you commented.Message ID: ***@***.***>

from e2fgvi.

Teravus commented on August 19, 2024

@firebeasty I've been playing with the major parameters... and came to the conclusion that a higher neighbor_stride works better on a smaller number of frames.

If you're going to use 24 frame long strides, try --neighbor_stride 10 instead of the default neighbor_stride 5. I'd also set the strides to 20 instead of 24 so it is a multiple of neighbor_stride.
Try --neighbor_stride 10 via the command line
and framestride 20 in the code.

This seems to reduce the skipping/flaring for me.

from e2fgvi.

Hi about the memory error about e2fgvi HOT 10 OPEN

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent