Comments (12)
That's really strange as I have code to specifically prevent getting out of memory errors. Can you copy/paste or take a screenshot of the exact error you get? it's possible that what I'm checking for breaks in certain cases
from esrgan.
I've put the script output below, I've tried taking a quick look myself but this is far out of my area of expertise
Upscaling ---------------------------------------- 0% -:--:--
Traceback (most recent call last):
File "C:\Users\2haloes\Documents\esrgan\utils\dataops.py", line 44, in auto_split_upscale
result = upscale_function(lr_img)
File "C:\Users\2haloes\Documents\esrgan\upscale.py", line 532, in upscale
output = self.process(img)
File "C:\Users\2haloes\Documents\esrgan\upscale.py", line 289, in process
output = self.model(img_LR).data.squeeze(0).float().cpu().clamp_(0, 1).numpy()
File "C:\Python39\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\2haloes\Documents\esrgan\utils\architecture.py", line 118, in forward
x = self.model(x)
File "C:\Python39\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Python39\lib\site-packages\torch\nn\modules\container.py", line 119, in forward
input = module(input)
File "C:\Python39\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\2haloes\Documents\esrgan\utils\block.py", line 92, in forward
output = x + self.sub(x)
File "C:\Python39\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Python39\lib\site-packages\torch\nn\modules\container.py", line 119, in forward
input = module(input)
File "C:\Python39\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\2haloes\Documents\esrgan\utils\block.py", line 317, in forward
out = self.RDB2(out)
File "C:\Python39\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\2haloes\Documents\esrgan\utils\block.py", line 432, in forward
x5 = self.conv5(torch.cat((x, x1, x2, x3, x4), 1))
File "C:\Python39\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Python39\lib\site-packages\torch\nn\modules\container.py", line 119, in forward
input = module(input)
File "C:\Python39\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Python39\lib\site-packages\torch\nn\modules\conv.py", line 399, in forward
return self._conv_forward(input, self.weight, self.bias)
File "C:\Python39\lib\site-packages\torch\nn\modules\conv.py", line 395, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: CUDA error: out of memory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\2haloes\Documents\esrgan\upscale.py", line 658, in <module>
app()
File "C:\Python39\lib\site-packages\typer\main.py", line 214, in __call__
return get_command(self)(*args, **kwargs)
File "C:\Python39\lib\site-packages\click\core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "C:\Python39\lib\site-packages\click\core.py", line 1053, in main
rv = self.invoke(ctx)
File "C:\Python39\lib\site-packages\click\core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "C:\Python39\lib\site-packages\click\core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "C:\Python39\lib\site-packages\typer\main.py", line 500, in wrapper
return callback(**use_params) # type: ignore
File "C:\Users\2haloes\Documents\esrgan\upscale.py", line 654, in main
upscale.run()
File "C:\Users\2haloes\Documents\esrgan\upscale.py", line 237, in run
rlt, depth = ops.auto_split_upscale(
File "C:\Users\2haloes\Documents\esrgan\utils\dataops.py", line 54, in auto_split_upscale
raise RuntimeError(e)
RuntimeError: CUDA error: out of memory
from esrgan.
So my assumption was correct. The error message that you're getting is different from the usual one, so my check I do fails. I can update this to better handle this alternate error.
Out of curiosity, what version of pytorch are you running?
from esrgan.
Let me know if that fixes it for you
from esrgan.
Unfortunely it does not fix the issue, I've setup a debugger on my machine and found that the exception is properly caught but it jumps from line 51 to line 55 (clearing the VRAM to reraising the exception). I can take more of a look later though.
For the pytorch version, I'm currently running on 1.8.1+cu111
from esrgan.
So it seems like what's going on is it's crashing again when it's in the process of clearing the VRAM. Does it give a different error for the second one?
from esrgan.
The stack trace is different so it looks like the fix did work and this is throwing the same exception message
I also got a memory summery after the first out of memory exception is thrown which I've attached below the exception message.
File "C:\Users\2haloes\Documents\esrgan\utils\dataops.py", line 45, in auto_split_upscale
result = upscale_function(lr_img)
File "C:\Users\2haloes\Documents\esrgan\upscale.py", line 532, in upscale
output = self.process(img)
File "C:\Users\2haloes\Documents\esrgan\upscale.py", line 289, in process
output = self.model(img_LR).data.squeeze(0).float().cpu().clamp_(0, 1).numpy()
File "C:\Python39\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\2haloes\Documents\esrgan\utils\architecture.py", line 118, in forward
x = self.model(x)
File "C:\Python39\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Python39\lib\site-packages\torch\nn\modules\container.py", line 119, in forward
input = module(input)
File "C:\Python39\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\2haloes\Documents\esrgan\utils\block.py", line 92, in forward
output = x + self.sub(x)
File "C:\Python39\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Python39\lib\site-packages\torch\nn\modules\container.py", line 119, in forward
input = module(input)
File "C:\Python39\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\2haloes\Documents\esrgan\utils\block.py", line 317, in forward
out = self.RDB2(out)
File "C:\Python39\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\2haloes\Documents\esrgan\utils\block.py", line 432, in forward
x5 = self.conv5(torch.cat((x, x1, x2, x3, x4), 1))
File "C:\Python39\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Python39\lib\site-packages\torch\nn\modules\container.py", line 119, in forward
input = module(input)
File "C:\Python39\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Python39\lib\site-packages\torch\nn\modules\conv.py", line 399, in forward
return self._conv_forward(input, self.weight, self.bias)
File "C:\Python39\lib\site-packages\torch\nn\modules\conv.py", line 395, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: CUDA error: out of memory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\2haloes\Documents\esrgan\upscale.py", line 658, in <module>
app()
File "C:\Python39\lib\site-packages\typer\main.py", line 214, in __call__
return get_command(self)(*args, **kwargs)
File "C:\Python39\lib\site-packages\click\core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "C:\Python39\lib\site-packages\click\core.py", line 1053, in main
rv = self.invoke(ctx)
File "C:\Python39\lib\site-packages\click\core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "C:\Python39\lib\site-packages\click\core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "C:\Python39\lib\site-packages\typer\main.py", line 500, in wrapper
return callback(**use_params) # type: ignore
File "C:\Users\2haloes\Documents\esrgan\upscale.py", line 654, in main
upscale.run()
File "C:\Users\2haloes\Documents\esrgan\upscale.py", line 237, in run
rlt, depth = ops.auto_split_upscale(
File "C:\Users\2haloes\Documents\esrgan\utils\dataops.py", line 51, in auto_split_upscale
torch.cuda.empty_cache()
File "C:\Python39\lib\site-packages\torch\cuda\memory.py", line 114, in empty_cache
torch._C._cuda_emptyCache()
RuntimeError: CUDA error: out of memory
| PyTorch CUDA memory summary, device ID 0 |
|---------------------------------------------------------------------------|
| CUDA OOMs: 0 | cudaMalloc retries: 0 |
|===========================================================================|
| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed |
|---------------------------------------------------------------------------|
| Allocated memory | 3633 MB | 5153 MB | 15281 MB | 11648 MB |
| from large pool | 3569 MB | 5089 MB | 15218 MB | 11648 MB |
| from small pool | 63 MB | 63 MB | 63 MB | 0 MB |
|---------------------------------------------------------------------------|
| Active memory | 3633 MB | 5153 MB | 15281 MB | 11648 MB |
| from large pool | 3569 MB | 5089 MB | 15218 MB | 11648 MB |
| from small pool | 63 MB | 63 MB | 63 MB | 0 MB |
|---------------------------------------------------------------------------|
| GPU reserved memory | 7186 MB | 7186 MB | 7186 MB | 0 B |
| from large pool | 7120 MB | 7120 MB | 7120 MB | 0 B |
| from small pool | 66 MB | 66 MB | 66 MB | 0 B |
|---------------------------------------------------------------------------|
| Non-releasable memory | 1778 MB | 1778 MB | 5385 MB | 3606 MB |
| from large pool | 1776 MB | 1776 MB | 5327 MB | 3550 MB |
| from small pool | 2 MB | 3 MB | 57 MB | 55 MB |
|---------------------------------------------------------------------------|
| Allocations | 712 | 713 | 735 | 23 |
| from large pool | 8 | 9 | 23 | 15 |
| from small pool | 704 | 705 | 712 | 8 |
|---------------------------------------------------------------------------|
| Active allocs | 712 | 713 | 735 | 23 |
| from large pool | 8 | 9 | 23 | 15 |
| from small pool | 704 | 705 | 712 | 8 |
|---------------------------------------------------------------------------|
| GPU reserved segments | 42 | 42 | 42 | 0 |
| from large pool | 9 | 9 | 9 | 0 |
| from small pool | 33 | 33 | 33 | 0 |
|---------------------------------------------------------------------------|
| Non-releasable allocs | 33 | 33 | 46 | 13 |
| from large pool | 5 | 5 | 13 | 8 |
| from small pool | 28 | 28 | 33 | 5 |
|===========================================================================|
from esrgan.
Wtf, it's running out of memory when clearing the memory... Could you try updating pytorch?
from esrgan.
I've updated pytorch and the issue still occurs however it also now came with a suggestion to set an env variable (CUDA_LAUNCH_BLOCKING = 1) so I did that and it came back with this exception which looks a lot more useful
Traceback (most recent call last):
File "C:\Users\2haloes\Documents\esrgan\utils\dataops.py", line 46, in auto_split_upscale
result = upscale_function(lr_img)
File "C:\Users\2haloes\Documents\esrgan\upscale.py", line 538, in upscale
output = self.process(img)
File "C:\Users\2haloes\Documents\esrgan\upscale.py", line 295, in process
output = self.model(img_LR).data.squeeze(0).float().cpu().clamp_(0, 1).numpy()
File "C:\Python39\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\2haloes\Documents\esrgan\utils\architecture.py", line 118, in forward
x = self.model(x)
File "C:\Python39\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Python39\lib\site-packages\torch\nn\modules\container.py", line 141, in forward
input = module(input)
File "C:\Python39\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\2haloes\Documents\esrgan\utils\block.py", line 92, in forward
output = x + self.sub(x)
File "C:\Python39\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Python39\lib\site-packages\torch\nn\modules\container.py", line 141, in forward
input = module(input)
File "C:\Python39\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\2haloes\Documents\esrgan\utils\block.py", line 317, in forward
out = self.RDB2(out)
File "C:\Python39\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\2haloes\Documents\esrgan\utils\block.py", line 432, in forward
x5 = self.conv5(torch.cat((x, x1, x2, x3, x4), 1))
File "C:\Python39\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Python39\lib\site-packages\torch\nn\modules\container.py", line 141, in forward
input = module(input)
File "C:\Python39\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Python39\lib\site-packages\torch\nn\modules\conv.py", line 446, in forward
return self._conv_forward(input, self.weight, self.bias)
File "C:\Python39\lib\site-packages\torch\nn\modules\conv.py", line 442, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue.
import torch
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.allow_tf32 = True
data = torch.randn([1, 192, 1080, 1920], dtype=torch.float, device='cuda', requires_grad=True)
net = torch.nn.Conv2d(192, 64, kernel_size=[3, 3], padding=[1, 1], stride=[1, 1], dilation=[1, 1], groups=1)
net = net.cuda().float()
out = net(data)
out.backward(torch.randn_like(out))
torch.cuda.synchronize()
ConvolutionParams
data_type = CUDNN_DATA_FLOAT
padding = [1, 1, 0]
stride = [1, 1, 0]
dilation = [1, 1, 0]
groups = 1
deterministic = false
allow_tf32 = true
input: TensorDescriptor 0000015321CA4A30
type = CUDNN_DATA_FLOAT
nbDims = 4
dimA = 1, 192, 1080, 1920,
strideA = 398131200, 2073600, 1920, 1,
output: TensorDescriptor 0000015321CA58A0
type = CUDNN_DATA_FLOAT
nbDims = 4
dimA = 1, 64, 1080, 1920,
strideA = 132710400, 2073600, 1920, 1,
weight: FilterDescriptor 0000015369D70F00
type = CUDNN_DATA_FLOAT
tensor_format = CUDNN_TENSOR_NCHW
nbDims = 4
dimA = 64, 192, 3, 3,
Pointer addresses:
input: 0000000CBB400000
output: 0000000C3C960000
weight: 0000000B0CF6B000
Forward algorithm: 1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\2haloes\Documents\esrgan\upscale.py", line 664, in <module>
app()
File "C:\Python39\lib\site-packages\typer\main.py", line 214, in __call__
return get_command(self)(*args, **kwargs)
File "C:\Python39\lib\site-packages\click\core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "C:\Python39\lib\site-packages\click\core.py", line 1053, in main
rv = self.invoke(ctx)
File "C:\Python39\lib\site-packages\click\core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "C:\Python39\lib\site-packages\click\core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "C:\Python39\lib\site-packages\typer\main.py", line 500, in wrapper
return callback(**use_params) # type: ignore
File "C:\Users\2haloes\Documents\esrgan\upscale.py", line 660, in main
upscale.run()
File "C:\Users\2haloes\Documents\esrgan\upscale.py", line 243, in run
rlt, depth = ops.auto_split_upscale(
File "C:\Users\2haloes\Documents\esrgan\utils\dataops.py", line 58, in auto_split_upscale
raise RuntimeError(e)
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue.
import torch
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.allow_tf32 = True
data = torch.randn([1, 192, 1080, 1920], dtype=torch.float, device='cuda', requires_grad=True)
net = torch.nn.Conv2d(192, 64, kernel_size=[3, 3], padding=[1, 1], stride=[1, 1], dilation=[1, 1], groups=1)
net = net.cuda().float()
out = net(data)
out.backward(torch.randn_like(out))
torch.cuda.synchronize()
ConvolutionParams
data_type = CUDNN_DATA_FLOAT
padding = [1, 1, 0]
stride = [1, 1, 0]
dilation = [1, 1, 0]
groups = 1
deterministic = false
allow_tf32 = true
input: TensorDescriptor 0000015321CA4A30
type = CUDNN_DATA_FLOAT
nbDims = 4
dimA = 1, 192, 1080, 1920,
strideA = 398131200, 2073600, 1920, 1,
output: TensorDescriptor 0000015321CA58A0
type = CUDNN_DATA_FLOAT
nbDims = 4
dimA = 1, 64, 1080, 1920,
strideA = 132710400, 2073600, 1920, 1,
weight: FilterDescriptor 0000015369D70F00
type = CUDNN_DATA_FLOAT
tensor_format = CUDNN_TENSOR_NCHW
nbDims = 4
dimA = 64, 192, 3, 3,
Pointer addresses:
input: 0000000CBB400000
output: 0000000C3C960000
weight: 0000000B0CF6B000
Forward algorithm: 1
from esrgan.
This is super weird, I've never seen this happen for inference before.
My next suggestion would just be to update your drivers and restart your computer and see if it still happens. I've had a similar error when training before and I just needed to restart my PC.
from esrgan.
Looks like updating to CUDA 11.5 has resolved the issue
While the error can still occur if things go too over the top (64x a 1080P image) it looks like enabling the env variable stops things from getting out of hand enough to cause the script to stop
UPDATE: After a couple of hours, the 64x run with the env variable output a 16GB image file so it may be worth setting a toggle that disables async
from esrgan.
64x on a 1080p image? That's a bit overkill lol. Glad it works now though
from esrgan.
Related Issues (8)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from esrgan.