/usr/local/lib/python3.8/dist-packages/diffusers/models/resnet.py:39: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert hidden_states.shape[1] == self.channels
/usr/local/lib/python3.8/dist-packages/diffusers/models/resnet.py:52: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if hidden_states.shape[0] >= 64:
/usr/local/lib/python3.8/dist-packages/diffusers/models/unet_2d_condition.py:349: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if not return_dict:
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
Generating optimizing model: onnx/unet_fp16.opt.onnx
[I] Folding Constants | Pass 1
[I] Total Nodes | Original: 8201, After Folding: 5947 | 2254 Nodes Folded
[I] Folding Constants | Pass 2
2022-12-27 21:32:57.943186900 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_7900
2022-12-27 21:32:57.943320900 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_7729
2022-12-27 21:32:57.943632400 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_7969
2022-12-27 21:32:57.943700800 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_7574
2022-12-27 21:32:57.943810200 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_7427
2022-12-27 21:32:57.944080600 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_7256
...
[I] Total Nodes | Original: 5947, After Folding: 4536 | 1411 Nodes Folded
[I] Folding Constants | Pass 3
[I] Total Nodes | Original: 4536, After Folding: 4536 | 0 Nodes Folded
Building TensorRT engine for onnx/unet_fp16.opt.onnx: engine/unet_fp16.plan
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 1719718653
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 1719718653
[W] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[I] Configuring with profiles: [Profile().add('sample', min=(2, 4, 64, 64), opt=(2, 4, 64, 64), max=(32, 4, 64, 64)).add('encoder_hidden_states', min=(2, 77, 768), opt=(2, 77, 768), max=(32, 77, 768)).add('timestep', min=[1], opt=[1], max=[1])]
[I] Building engine with configuration:
Flags | [FP16]
Engine Capability | EngineCapability.DEFAULT
Memory Pools | [WORKSPACE: 7725.39 MiB]
Tactic Sources | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
Profiling Verbosity | ProfilingVerbosity.DETAILED
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[W] Requested amount of GPU memory (8589934592 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[W] Skipping tactic 3 due to insufficient memory on requested size of 8589934592 detected for tactic 0x0000000000000004.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[W] Skipping tactic 9 due to insufficient memory on requested size of 8589934592 detected for tactic 0x000000000000003c.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[W] Skipping tactic 8 due to insufficient memory on requested size of 8589934592 detected for tactic 0x000000000000003c.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
...
[W] - 24 weights are affected by this issue: Detected subnormal FP16 values.
[I] Finished engine building in 498.222 seconds
[I] Saving engine to engine/unet_fp16.plan
Exception ignored in: <function Engine.__del__ at 0x7fcd429b1550>
Traceback (most recent call last):
File "/workspace/voltaML-fast-stable-diffusion/utilities.py", line 50, in __del__
[buf.free() for buf in self.buffers.values() if isinstance(buf, cuda.DeviceArray) ]
AttributeError: 'Engine' object has no attribute 'buffers'
Exporting model: onnx/vae.onnx
Downloading: 23%|████████████████████████▍
/usr/local/lib/python3.8/dist-packages/diffusers/models/vae.py:583: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if not return_dict:
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
Generating optimizing model: onnx/vae.opt.onnx
[I] Folding Constants | Pass 1
[I] Total Nodes | Original: 759, After Folding: 679 | 80 Nodes Folded
[I] Folding Constants | Pass 2
2022-12-27 21:43:47.511248100 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_149
2022-12-27 21:43:47.511248100 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_149
2022-12-27 21:43:47.511309200 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_66
[I] Total Nodes | Original: 679, After Folding: 675 | 4 Nodes Folded
[I] Folding Constants | Pass 3
[I] Total Nodes | Original: 675, After Folding: 675 | 0 Nodes Folded
Building TensorRT engine for onnx/vae.opt.onnx: engine/vae.plan
[I] Configuring with profiles: [Profile().add('latent', min=(1, 4, 64, 64), opt=(1, 4, 64, 64), max=(16, 4, 64, 64))]
[I] Building engine with configuration:
Flags | [FP16]
Engine Capability | EngineCapability.DEFAULT
Memory Pools | [WORKSPACE: 7725.39 MiB]
Tactic Sources | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
Profiling Verbosity | ProfilingVerbosity.DETAILED
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[W] Skipping tactic 0 due to insufficient memory on requested size of 8589934592 detected for tactic 0x00000000000003e8.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[W] Skipping tactic 1 due to insufficient memory on requested size of 8589934592 detected for tactic 0x00000000000003ea.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[W] Skipping tactic 2 due to insufficient memory on requested size of 8589934592 detected for tactic 0x0000000000000000.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 1: [virtualMemoryBuffer.cpp::resizePhysical::132] Error Code 1: Cuda Driver (invalid argument)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 1: [virtualMemoryBuffer.cpp::resizePhysical::132] Error Code 1: Cuda Driver (invalid argument)
[W] Skipping tactic 12 due to insufficient memory on requested size of 17179869184 detected for tactic 0x994f5b723e2d80da.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 1: [virtualMemoryBuffer.cpp::resizePhysical::132] Error Code 1: Cuda Driver (invalid argument)
[W] Skipping tactic 13 due to insufficient memory on requested size of 17179869184 detected for tactic 0x65d82d184f452332.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 1: [virtualMemoryBuffer.cpp::resizePhysical::132] Error Code 1: Cuda Driver (invalid argument)
[W] Skipping tactic 14 due to insufficient memory on requested size of 17179869184 detected for tactic 0x8d5c64a52fab02c9.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 1: [virtualMemoryBuffer.cpp::resizePhysical::132] Error Code 1: Cuda Driver (invalid argument)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[I] Finished engine building in 221.174 seconds
[I] Saving engine to engine/vae.plan
Exception ignored in: <function Engine.__del__ at 0x7fcd429b1550>
Traceback (most recent call last):
File "/workspace/voltaML-fast-stable-diffusion/utilities.py", line 50, in __del__
[buf.free() for buf in self.buffers.values() if isinstance(buf, cuda.DeviceArray) ]
AttributeError: 'Engine' object has no attribute 'buffers'
Building TensorRT engine for onnx/clip.opt.onnx: engine/CompVis/stable-diffusion-v1-4/clip.plan
[I] Configuring with profiles: [Profile().add('input_ids', min=(1, 77), opt=(1, 77), max=(16, 77))]
[I] Building engine with configuration:
Flags | [FP16]
Engine Capability | EngineCapability.DEFAULT
Memory Pools | [WORKSPACE: 7725.39 MiB]
Tactic Sources | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
Profiling Verbosity | ProfilingVerbosity.DETAILED
[W] - 6 weights are affected by this issue: Detected subnormal FP16 values.
[I] Finished engine building in 49.716 seconds
[I] Saving engine to engine/CompVis/stable-diffusion-v1-4/clip.plan
Building TensorRT engine for onnx/unet_fp16.opt.onnx: engine/CompVis/stable-diffusion-v1-4/unet_fp16.plan
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 1719718653
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 1719718653
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 1719718653
[I] Configuring with profiles: [Profile().add('sample', min=(2, 4, 64, 64), opt=(2, 4, 64, 64), max=(32, 4, 64, 64)).add('encoder_hidden_states', min=(2, 77, 768), opt=(2, 77, 768), max=(32, 77, 768)).add('timestep', min=[1], opt=[1], max=[1])]
[I] Building engine with configuration:
Flags | [FP16]
Engine Capability | EngineCapability.DEFAULT
Memory Pools | [WORKSPACE: 7725.39 MiB]
Tactic Sources | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
Profiling Verbosity | ProfilingVerbosity.DETAILED
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
...
[W] - 22 weights are affected by this issue: Detected subnormal FP16 values.
[I] Finished engine building in 426.725 seconds
[I] Saving engine to engine/CompVis/stable-diffusion-v1-4/unet_fp16.plan
Building TensorRT engine for onnx/vae.opt.onnx: engine/CompVis/stable-diffusion-v1-4/vae.plan
[I] Configuring with profiles: [Profile().add('latent', min=(1, 4, 64, 64), opt=(1, 4, 64, 64), max=(16, 4, 64, 64))]
[I] Building engine with configuration:
Flags | [FP16]
Engine Capability | EngineCapability.DEFAULT
Memory Pools | [WORKSPACE: 7725.39 MiB]
Tactic Sources | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
Profiling Verbosity | ProfilingVerbosity.DETAILED
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
...
[I] Finished engine building in 164.962 seconds
[I] Saving engine to engine/CompVis/stable-diffusion-v1-4/vae.plan
Loading TensorRT engine: engine/CompVis/stable-diffusion-v1-4/clip.plan
[I] Loading bytes from engine/CompVis/stable-diffusion-v1-4/clip.plan
Loading TensorRT engine: engine/CompVis/stable-diffusion-v1-4/unet_fp16.plan
[I] Loading bytes from engine/CompVis/stable-diffusion-v1-4/unet_fp16.plan
Loading TensorRT engine: engine/CompVis/stable-diffusion-v1-4/vae.plan
[I] Loading bytes from engine/CompVis/stable-diffusion-v1-4/vae.plan
[E] 1: [defaultAllocator.cpp::allocate::20] Error Code 1: Cuda Runtime (out of memory)
[W] Requested amount of GPU memory (5570036736 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[E] 2: [executionContext.cpp::ExecutionContext::409] Error Code 2: OutOfMemory (no further information)
[I] Finished engine building in 164.962 seconds
[I] Saving engine to engine/CompVis/stable-diffusion-v1-4/vae.plan
Loading TensorRT engine: engine/CompVis/stable-diffusion-v1-4/clip.plan
[I] Loading bytes from engine/CompVis/stable-diffusion-v1-4/clip.plan
Loading TensorRT engine: engine/CompVis/stable-diffusion-v1-4/unet_fp16.plan
[I] Loading bytes from engine/CompVis/stable-diffusion-v1-4/unet_fp16.plan
Loading TensorRT engine: engine/CompVis/stable-diffusion-v1-4/vae.plan
[I] Loading bytes from engine/CompVis/stable-diffusion-v1-4/vae.plan
[E] 1: [defaultAllocator.cpp::allocate::20] Error Code 1: Cuda Runtime (out of memory)
[W] Requested amount of GPU memory (5570036736 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[E] 2: [executionContext.cpp::ExecutionContext::409] Error Code 2: OutOfMemory (no further information)
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 961k/961k [00:00<00:00, 1.03MB/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 525k/525k [00:00<00:00, 738kB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 389/389 [00:00<00:00, 36.6kB/s]
Downloading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 905/905 [00:00<00:00, 582kB/s]
[I] Warming up ..
[I] Running StableDiffusion pipeline
Loading TensorRT engine: engine/CompVis/stable-diffusion-v1-4/clip.plan
[I] Loading bytes from engine/CompVis/stable-diffusion-v1-4/clip.plan
Loading TensorRT engine: engine/CompVis/stable-diffusion-v1-4/unet_fp16.plan
[I] Loading bytes from engine/CompVis/stable-diffusion-v1-4/unet_fp16.plan
Loading TensorRT engine: engine/CompVis/stable-diffusion-v1-4/vae.plan
[I] Loading bytes from engine/CompVis/stable-diffusion-v1-4/vae.plan
[E] 1: [wrapper.cpp::CublasWrapper::85] Error Code 1: Cublas (Could not initialize cublas. Please check CUDA installation.)
[E] 1: [engine.cpp::deserialize::867] Error Code 1: Serialization (Serialization assertion postDeserializationCheck() failed.Post deserialization check failure)
[E] 4: [runtime.cpp::deserializeCudaEngine::66] Error Code 4: Internal Error (Engine deserialization failed.)
[!] Could not deserialize engine. See log for details.
172.17.0.1 - - [27/Dec/2022 21:58:48] "POST /voltaml/job HTTP/1.1" 500 -
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2548, in __call__
return self.wsgi_app(environ, start_response)
File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2528, in wsgi_app
response = self.handle_exception(e)
File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2525, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1822, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1820, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1796, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "/workspace/voltaML-fast-stable-diffusion/app.py", line 88, in upload_file
pipeline_time = infer_trt(saving_path=saving_path,
File "/workspace/voltaML-fast-stable-diffusion/volta_accelerate.py", line 678, in infer_trt
load_trt(saving_path, model, prompt, img_height, img_width, num_inference_steps)
File "/workspace/voltaML-fast-stable-diffusion/volta_accelerate.py", line 599, in load_trt
trt_model.loadEngines(engine_dir, onnx_dir, args.onnx_opset,
File "/workspace/voltaML-fast-stable-diffusion/volta_accelerate.py", line 309, in loadEngines
self.engine[model_name].activate()
File "/workspace/voltaML-fast-stable-diffusion/utilities.py", line 78, in activate
self.engine = engine_from_bytes(bytes_from_path(self.engine_path))
File "<string>", line 3, in func_impl
File "/usr/local/lib/python3.8/dist-packages/polygraphy/backend/base/loader.py", line 42, in __call__
return self.call_impl(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/polygraphy/backend/trt/loader.py", line 564, in call_impl
G_LOGGER.critical("Could not deserialize engine. See log for details.")
File "/usr/local/lib/python3.8/dist-packages/polygraphy/logger/logger.py", line 597, in critical
raise PolygraphyException(message) from None
polygraphy.exception.exception.PolygraphyException: Could not deserialize engine. See log for details.