Comments (10)
At least for me, my problem was number of frames, given that they were all loaded in vram. I solved it with some trivial code changes that batched the frames to inference.. which worked so long as the batch size is evenly divisible by neighbor_stride.
In this way, results do not suffer and VRAM is reduced. Is this something that you would like as a pull request @Paper99 ? (note for myself, if you say yes. This edit is on my D drive under the E2FGVI folder)Pretty much zero experience in this rodeo @Teravus, but any chance you could share specifically how you set up the batching? I'm definitely hitting that same limit on my own samples. Thanks!
Here's my Test.py in the zip.
test.zip
The main thing is.. create a stride.. then loop over the stride.. putting only the frames for this stride in VRAM at a time.
Then, next stride, replace those images in VRAM with new images.
That way they're not all in VRAM at once.
With the frames and masks;
x_frames = [rframes[i:i + framestride] for i in range(0, len(rframes), framestride)]
x_masks = [rmasks[i:i + framestride] for i in range(0, len(rmasks), framestride)]
Look for the line that says; framestride = 200.
You can change that to more or less depending on your VRAM. If you lower it, the process will take less VRAM. If you raise it. The process will take more VRAM.
A Couple of things to note:
- The framestride must be divisible evenly by the neighbor_stride. The neighbor_stride is 5 by default but you can change it on the command line. By Default: Any number that you can divide by 5.
- When you're processing, it treats each stride worth of images as its own process. So you'll see the statusbar go from the start to end multiple times depending on your framestride. Let's say that you have 750 frames. If you set the framestride to 200, you'll see it go from 0 to full 4 times. The last stride will be less.
- Across all of the strides, It outputs the results of each frame into comp_frames, which then gets turned into your video.
- I didn't update the little video player UI that comes up after the video is complete. It will display only part of your completed video. but your output video file will be the complete video.
- I've only tested this change when you are using a directory of individual frames and have not tested it on a video input file directly. It's possible it might work, but I have not tested it. I use ffmpeg to extract the frames.
Hopefully this is helpful.
from e2fgvi.
Could you tell me the spatial resolution of your video?
from e2fgvi.
Hi, it's 720p. And I am able to run my own video by changing the setting of --step in test.py
But the result is terrible.
from e2fgvi.
We use a GPU with 48G memory to process 720p video.
The setting of --step
does affect the performance.
To check the influence of this parameter, you could first keep the original settings and use e2fgvi to process the downscaled video.
from e2fgvi.
At least for me, my problem was number of frames, given that they were all loaded in vram. I solved it with some trivial code changes that batched the frames to inference.. which worked so long as the batch size is evenly divisible by neighbor_stride.
In this way, results do not suffer and VRAM is reduced.
Is this something that you would like as a pull request @Paper99 ?
(note for myself, if you say yes. This edit is on my D drive under the E2FGVI folder)
from e2fgvi.
At least for me, my problem was number of frames, given that they were all loaded in vram. I solved it with some trivial code changes that batched the frames to inference.. which worked so long as the batch size is evenly divisible by neighbor_stride.
In this way, results do not suffer and VRAM is reduced. Is this something that you would like as a pull request @Paper99 ? (note for myself, if you say yes. This edit is on my D drive under the E2FGVI folder)
Pretty much zero experience in this rodeo @Teravus, but any chance you could share specifically how you set up the batching? I'm definitely hitting that same limit on my own samples. Thanks!
from e2fgvi.
At least for me, my problem was number of frames, given that they were all loaded in vram. I solved it with some trivial code changes that batched the frames to inference.. which worked so long as the batch size is evenly divisible by neighbor_stride.
In this way, results do not suffer and VRAM is reduced. Is this something that you would like as a pull request @Paper99 ? (note for myself, if you say yes. This edit is on my D drive under the E2FGVI folder)Pretty much zero experience in this rodeo @Teravus, but any chance you could share specifically how you set up the batching? I'm definitely hitting that same limit on my own samples. Thanks!
Here's my Test.py in the zip. test.zip
The main thing is.. create a stride.. then loop over the stride.. putting only the frames for this stride in VRAM at a time. Then, next stride, replace those images in VRAM with new images.
That way they're not all in VRAM at once.
With the frames and masks;
x_frames = [rframes[i:i + framestride] for i in range(0, len(rframes), framestride)] x_masks = [rmasks[i:i + framestride] for i in range(0, len(rmasks), framestride)]
Look for the line that says; framestride = 200.
You can change that to more or less depending on your VRAM. If you lower it, the process will take less VRAM. If you raise it. The process will take more VRAM.
A Couple of things to note:
- The framestride must be divisible evenly by the neighbor_stride. The neighbor_stride is 5 by default but you can change it on the command line. By Default: Any number that you can divide by 5.
- When you're processing, it treats each stride worth of images as its own process. So you'll see the statusbar go from the start to end multiple times depending on your framestride. Let's say that you have 750 frames. If you set the framestride to 200, you'll see it go from 0 to full 4 times. The last stride will be less.
- Across all of the strides, It outputs the results of each frame into comp_frames, which then gets turned into your video.
- I didn't update the little video player UI that comes up after the video is complete. It will display only part of your completed video. but your output video file will be the complete video.
- I've only tested this change when you are using a directory of individual frames and have not tested it on a video input file directly. It's possible it might work, but I have not tested it. I use ffmpeg to extract the frames.
Hopefully this is helpful.
This was really helpful! I got it working great! I had to debug your cuda:1 line to cuda since I only have 1 gpu, but otherwise, worked great with some minor parameter tweaking.
I am curious if there's a way to have it temporally link up between batches - I do notice some minor pops where the strides are broken up. Is there some way you could maintain the data from the last frame before going to the next stride so it has some temporal consistency between them? -- I understand completely that that's a huge wish item, but I have to ask for selfish reasons! 😅 Again, thank you so much for getting me this far! Your fix in the other thread on setting up the windows environment from 3 weeks ago was insanely helpful after failing at getting this going a month ago!
from e2fgvi.
At least for me, my problem was number of frames, given that they were all loaded in vram. I solved it with some trivial code changes that batched the frames to inference.. which worked so long as the batch size is evenly divisible by neighbor_stride.
In this way, results do not suffer and VRAM is reduced. Is this something that you would like as a pull request @Paper99 ? (note for myself, if you say yes. This edit is on my D drive under the E2FGVI folder)Pretty much zero experience in this rodeo @Teravus, but any chance you could share specifically how you set up the batching? I'm definitely hitting that same limit on my own samples. Thanks!
Here's my Test.py in the zip. test.zip
The main thing is.. create a stride.. then loop over the stride.. putting only the frames for this stride in VRAM at a time. Then, next stride, replace those images in VRAM with new images.
That way they're not all in VRAM at once.
With the frames and masks;x_frames = [rframes[i:i + framestride] for i in range(0, len(rframes), framestride)] x_masks = [rmasks[i:i + framestride] for i in range(0, len(rmasks), framestride)]
Look for the line that says; framestride = 200.
You can change that to more or less depending on your VRAM. If you lower it, the process will take less VRAM. If you raise it. The process will take more VRAM.
A Couple of things to note:
- The framestride must be divisible evenly by the neighbor_stride. The neighbor_stride is 5 by default but you can change it on the command line. By Default: Any number that you can divide by 5.
- When you're processing, it treats each stride worth of images as its own process. So you'll see the statusbar go from the start to end multiple times depending on your framestride. Let's say that you have 750 frames. If you set the framestride to 200, you'll see it go from 0 to full 4 times. The last stride will be less.
- Across all of the strides, It outputs the results of each frame into comp_frames, which then gets turned into your video.
- I didn't update the little video player UI that comes up after the video is complete. It will display only part of your completed video. but your output video file will be the complete video.
- I've only tested this change when you are using a directory of individual frames and have not tested it on a video input file directly. It's possible it might work, but I have not tested it. I use ffmpeg to extract the frames.
Hopefully this is helpful.
This was really helpful! I got it working great! I had to debug your cuda:1 line to cuda since I only have 1 gpu, but otherwise, worked great with some minor parameter tweaking.
I am curious if there's a way to have it temporally link up between batches - I do notice some minor pops where the strides are broken up. Is there some way you could maintain the data from the last frame before going to the next stride so it has some temporal consistency between them? -- I understand completely that that's a huge wish item, but I have to ask for selfish reasons! 😅 Again, thank you so much for getting me this far! Your fix in the other thread on setting up the windows environment from 3 weeks ago was insanely helpful after failing at getting this going a month ago!
Glad it was helpful. Sorry about the cuda:1 thing. I have two GPUs and sent everything to the GPU that I wasn't already using for something.
To answer your question about coherency between batches.. Probably yes. However, we'd need to write a special neighbor routine for [neighbor_stride] frames at the end and beginning of batches..
In my test video that I'm using, I didn't see any flashes but it's a very limited use case... with a mask that stays roughly the same for each frame. I bet if I clone one of the sample videos and reverse it, then append it to the original... then I'd see flashes also.
from e2fgvi.
from e2fgvi.
@firebeasty I've been playing with the major parameters... and came to the conclusion that a higher neighbor_stride works better on a smaller number of frames.
If you're going to use 24 frame long strides, try --neighbor_stride 10 instead of the default neighbor_stride 5. I'd also set the strides to 20 instead of 24 so it is a multiple of neighbor_stride.
Try --neighbor_stride 10 via the command line
and framestride 20 in the code.
This seems to reduce the skipping/flaring for me.
from e2fgvi.
Related Issues (20)
- Log the training and testing configuration on ubuntu 18.04 CUDA 10.2 HOT 2
- mask HOT 2
- 完成一次训练要多久呢 HOT 4
- CUDA out of memory HOT 1
- Resolution or video length lead to cuda out of memory question HOT 8
- Flow warping
- Issue in loading HQ model? HOT 1
- How to convert this model to pytorch mobile?
- RuntimeError: CUDA error: no kernel image is available for execution on the device
- How can I use only one mask files? HOT 1
- Ask for video length?
- CUDA OOM for ~1-2 min long videos HOT 3
- loss explosion when training on custom Dataset HOT 3
- Worse performance than FGT on custom video HOT 5
- Solving environment: failed ResolvePackageNotFound:
- stuck when predicting image HOT 1
- Would training only a sequence of frames from a video improve the training speed?
- BidirectionalPropagation 参数顺序问题 HOT 3
- Evaluation data from table 1 HOT 5
- VFID calculation error HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from e2fgvi.