Comments (22)
我刚刚在主分支添加了判断tensorrt版本的功能,你能帮忙测试一下吗,我这里用8.4.3.1版本有问题,只要在main_dynamic_batch.cpp载入模型就行,其余不用管
OK,我测试的8.4.3.1没问题
from anomalib-tensorrt-cpp.
这个我也不懂,我也在看教程,试着改,改出成果了告诉你
from anomalib-tensorrt-cpp.
已经支持动态batch推理了,不过图片前后处理仍然为顺序处理的
载入模型只需要添加指定batch的代码
for (int i = 0; i < nbBindings; i++) {
string name = this->engine->getIOTensorName(i);
int mode = int(this->engine->getTensorIOMode(name.c_str()));
nvinfer1::DataType dtype = this->engine->getTensorDataType(name.c_str());
nvinfer1::Dims dims = this->context->getTensorShape(name.c_str());
// *******************添加这些 *******************
if ((*dims.d == -1) && (mode == 1)) {
nvinfer1::Dims minDims = engine->getProfileShape(name.c_str(), 0, nvinfer1::OptProfileSelector::kMIN);
nvinfer1::Dims optDims = engine->getProfileShape(name.c_str(), 0, nvinfer1::OptProfileSelector::kOPT);
nvinfer1::Dims maxDims = engine->getProfileShape(name.c_str(), 0, nvinfer1::OptProfileSelector::kMAX);
// 自己设置的batch必须在最小和最大batch之间
assert(this->dynamic_batch_size >= minDims.d[0] && this->dynamic_batch_size <= maxDims.d[0]);
// 显式设置batch
context->setInputShape(name.c_str(), nvinfer1::Dims4(this->dynamic_batch_size, maxDims.d[1], maxDims.d[2], maxDims.d[3]));
dims = context->getTensorShape(name.c_str());
}
// *******************添加这些 *******************
int totalSize = volume(dims) * getElementSize(dtype);
this->bufferSize[i] = totalSize;
cudaMalloc(&this->cudaBuffers[i], totalSize); // 分配显存空间
...
}
下面为使用方法
前置条件,需要导出的onnx的batch为动态的
- 方法1(需要重新训练或导出onnx)
在导出代码中添加一行https://github.com/openvinotoolkit/anomalib/blob/main/src/anomalib/deploy/export.py#L155
torch.onnx.export( model.model, torch.zeros((1, 3, *input_size)).to(model.device), str(onnx_path), opset_version=11, input_names=["input"], output_names=["output"], dynamic_axes={"input": {0: "batch_size"}, "output": {0: "batch_size"}}, # add this line to support dynamic batch )
- 方法2(不需要重新训练模型,但不保证成功)
使用 [onnx-modifier](https://github.com/ZhangGe6/onnx-modifier) 将模型的batch调整为动态的
导出engine
# 动态batch,model.onnx的batch为动态的 input为输入名字, 1, 4, 8要手动指定,256为输出尺寸
trtexec --onnx=model.onnx --saveEngine=model.engine --minShapes=input:1x3x256x256 --optShapes=input:4x3x256x256 --maxShapes=input:8x3x256x256
运行
需要显式指定
dynamic_batch_size
,范围在设定的minShapes
和maxShapes
之间
#include "inference.hpp"
#include <opencv2/opencv.hpp>
int main() {
// patchcore模型训练配置文件删除了center_crop
// trtexec --onnx=model.onnx --saveEngine=model.engine 转换模型
// 动态batch,model.onnx的batch为动态的 input为输入名字, 1, 4, 8要手动指定
// trtexec --onnx=model.onnx --saveEngine=model.engine --minShapes=input:1x3x256x256 --optShapes=input:4x3x256x256 --maxShapes=input:8x3x256x256
string model_path = "D:/ml/code/anomalib/results/efficient_ad/mvtec/bottle/run/weights/openvino/model.engine";
string meta_path = "D:/ml/code/anomalib/results/efficient_ad/mvtec/bottle/run/weights/openvino/metadata.json";
string image_dir = "D:/ml/code/anomalib/datasets/MVTec/bottle/test/broken_large";
bool efficient_ad = true; // 是否使用efficient_ad模型
int dynamic_batch_size = 4; // 显式指定batch,要在最小和最大batch之间
// 创建推理器
auto inference = Inference(model_path, meta_path, efficient_ad, dynamic_batch_size);
// 读取全部图片路径
vector<cv::String> paths = getImagePaths(image_dir);
// batch为几就输入几张图片
vector<cv::Mat> images;
for (int i = 0; i < dynamic_batch_size; i++) {
cv::Mat image = cv::imread(paths[i]);
cv::cvtColor(image, image, cv::ColorConversionCodes::COLOR_BGR2RGB);
images.push_back(image);
}
// 推理
vector<Result> results = inference.dynamicBatchInfer(images);
// 查看结果
for (int i = 0; i < dynamic_batch_size; i++) {
cout << results[i].score << endl;
cv::resize(results[i].anomaly_map, results[i].anomaly_map, { 1500, 500 });
cv::imshow(std::to_string(i), results[i].anomaly_map);
}
cv::waitKey(0);
return 0;
}
from anomalib-tensorrt-cpp.
感谢大佬的回复!!!
我想使用fastflow,它的onnx模型不支持动态batch的导出,这该怎么解决呢?
from anomalib-tensorrt-cpp.
我在修改了官方的anomalib导出onnx的代码,再训练就能成功导出了,也使用trtexec导出engine,用这个库实现了推理
这是我用的库的版本
numpy 1.23.5
onnx 1.14.0
openvino 2023.0.1
openvino-dev 2023.0.1
openvino-telemetry 2023.1.0
pytorch-lightning 1.6.5
torch 1.13.1+cu117
torchaudio 0.13.1+cu117
torchmetrics 0.10.3
torchvision 0.14.1+cu117
下面是这个库打印的内容
loading filename from:D:/ml/code/anomalib/results/fastflow/mvtec/bottle/run/weights/openvino/model.engine
[08/22/2023-17:39:46] [I] [TRT] Loaded engine size: 56 MiB
[08/22/2023-17:39:46] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +55, now: CPU 0, GPU 55 (MiB)
deserialize done
[08/22/2023-17:39:46] [I] [TRT] [MS] Running engine with multi stream info
[08/22/2023-17:39:46] [I] [TRT] [MS] Number of aux streams is 2
[08/22/2023-17:39:46] [I] [TRT] [MS] Number of total worker streams is 3
[08/22/2023-17:39:46] [I] [TRT] [MS] The main stream provided by execute/enqueue calls is the first worker stream
[08/22/2023-17:39:46] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +67, now: CPU 0, GPU 122 (MiB)
[08/22/2023-17:39:46] [W] [TRT] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
name: input, mode: 1, dims: [4, 3, 256, 256], totalSize: 3145728
name: output, mode: 2, dims: [4, 1, 256, 256], totalSize: 1048576
warm up finish
0.681545
0.691419
0.669558
0.691138
from anomalib-tensorrt-cpp.
我在导出onnx时使用simplify简化模型,就可以正常转成动态batch的engine了
onnx_path = export_path / "model.onnx"
torch.onnx.export(
model.model,
torch.zeros((1, 3, *input_size)).to(model.device),
str(onnx_path),
opset_version=11,
input_names=["input"],
output_names=["output"],
dynamic_axes={"input": {0: "batch_size"}}, # add this line to support dynamic batch
)
# 简化模型simplify
model_ = onnx.load(onnx_path)
model_simp, check = simplify(model_)
assert check, "Simplified ONNX model could not be validated"
onnx.save(model_simp, onnx_path)
但是我在使用C++创建模型时遇到了另一个问题:
在mode == 1
时也就是输入节点为"input",设置了最小的batch,dims维度是{4, 3, 256, 256}
那当mode == 2
也就是输出节点"output",dims的维度是{-1, 1, 256, 256}
所以,这个输出的内存分配是不是应该跟输入一样呢??
from anomalib-tensorrt-cpp.
context并没有setOutputShape
方法
下面是我的测试,不同的batchsize虽然只设定的Input的形状,但是Output的batch也随着Input的batch变化也变化了
# batch_size = 2
name: input, mode: 1, dims: [2, 3, 256, 256], totalSize: 1572864
name: onnx::Mul_276, mode: 2, dims: [2, 1, 256, 256], totalSize: 524288
name: onnx::Mul_281, mode: 2, dims: [2, 1, 256, 256], totalSize: 524288
name: output, mode: 2, dims: [2, 1, 256, 256], totalSize: 524288
# batch_size = 4
name: input, mode: 1, dims: [4, 3, 256, 256], totalSize: 3145728
name: onnx::Mul_276, mode: 2, dims: [4, 1, 256, 256], totalSize: 1048576
name: onnx::Mul_281, mode: 2, dims: [4, 1, 256, 256], totalSize: 1048576
name: output, mode: 2, dims: [4, 1, 256, 256], totalSize: 1048576
from anomalib-tensorrt-cpp.
z这应该跟我转出的模型有关,我在执行第127行时输入输出节点的第一个维度都是-1
from anomalib-tensorrt-cpp.
我用fastflow测试,Input的batch为-1,后面Output就为设定到batch了
我用的tensorrt是8.6.1
from anomalib-tensorrt-cpp.
我用的TensorRT-8.4.3.1,context没有setInputShape
,所以我用的setBindingDimensions
替代了,不知道是不是这个导致输出batch的维度没有变化
from anomalib-tensorrt-cpp.
那可能也需要使用setBindingDimensions
设定输出形状为 [dynamic_batch_size, 1, 256, 256],你试试添加下面这些代码这个把输出的batch也绑定试试
不过这样设置会导致patchcore没法用,因为它有2个输出,1个输出为[batch, 1, 256, 256],另一个输出为[batch],要专门设置
if ((*dims.d == -1) && (name != "input")) {
int nInputIdx = this->engine->getBindingIndex(name.c_str());
nvinfer1::Dims maxDims = this->engine->getProfileDimensions(nInputIdx, 0, nvinfer1::OptProfileSelector::kMAX);
// 显式设置batch
this->context->setBindingDimensions(nInputIdx, nvinfer1::Dims4(this->dynamic_batch_size, 1, maxDims.d[2], maxDims.d[3]));
dims = this->context->getBindingDimensions(nInputIdx);
}
from anomalib-tensorrt-cpp.
那可能也需要使用
setBindingDimensions
设定输出形状为 [dynamic_batch_size, 1, 256, 256],你试试添加下面这些代码这个把输出的batch也绑定试试if ((*dims.d == -1) && (name != "input")) { int nInputIdx = this->engine->getBindingIndex(name.c_str()); nvinfer1::Dims maxDims = this->engine->getProfileDimensions(nInputIdx, 0, nvinfer1::OptProfileSelector::kMAX); // 显式设置batch this->context->setBindingDimensions(nInputIdx, nvinfer1::Dims4(this->dynamic_batch_size, 1, maxDims.d[2], maxDims.d[3])); dims = this->context->getBindingDimensions(nInputIdx); }
应该是不可以这样操作的,会报错:
[08/23/2023-15:17:06] [I] [TRT] [MemUsageChange] Init CUDA: CPU +499, GPU +0, now: CPU 18546, GPU 1205 (MiB)
[08/23/2023-15:17:06] [I] [TRT] Loaded engine size: 56 MiB
[08/23/2023-15:17:06] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
deserialize done
[08/23/2023-15:17:06] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[08/23/2023-15:17:24] [E] [TRT] 3: [executionContext.cpp::nvinfer1::rt::ExecutionContext::setBindingDimensions::954] Error Code 3: API Usage Error (Parameter check failed at: executionContext.cpp::nvinfer1::rt::ExecutionContext::setBindingDimensions::954, condition: mEngine.bindingIsInput(bindingIndex)
from anomalib-tensorrt-cpp.
C++的推理我是参照python的例子修改来的,python写的tensorrt的例子在8.5以前的版本也没有设定output的shape,这是我自己的例子 https://github.com/NagatoYuki0943/dl-infer-learn/blob/main/python/tensorrt.ipynb ,这是官方的例子 https://github.com/NVIDIA/TensorRT/blob/main/samples/python/efficientdet/infer.py ,都只设定了Input shape,我下载8.4.3.1试试
from anomalib-tensorrt-cpp.
把这两段去掉,只设置输出的维度
//nvinfer1::Dims maxDims = this->engine->getProfileDimensions(nIdx, 0, nvinfer1::OptProfileSelector::kMAX);
// 显式设置batch
//this->context->setBindingDimensions(nIdx, nvinfer1::Dims4(this->dynamic_batch_size, 1, dims.d[2], dims.d[3]));
from anomalib-tensorrt-cpp.
是已经成功了吗?
from anomalib-tensorrt-cpp.
我创了一个新分支https://github.com/NagatoYuki0943/anomalib-tensorrt-cpp/tree/tensorrt8.4.3.1 ,你可以试试
from anomalib-tensorrt-cpp.
是已经成功了吗?
嗯嗯,感谢大佬的帮助!!笔芯
from anomalib-tensorrt-cpp.
我刚刚在主分支添加了判断tensorrt版本的功能,你能帮忙测试一下吗,我这里用8.4.3.1版本有问题,只要在main_dynamic_batch.cpp载入模型就行,其余不用管
from anomalib-tensorrt-cpp.
还有一个问题:设置dynamic_batch_size为8时,但是我的输入图片的batch为4张图,在执行推断时会报错,这个该怎么解决呀?
from anomalib-tensorrt-cpp.
这个我也没解决,现在是batch为几,就输入几张图片,在python版本是可以用大的batchsize,输入少量图片的 https://github.com/NagatoYuki0943/dl-infer-learn/blob/main/python/tensorrt.ipynb 这里就有例子,不过C++版本不行,我查查吧
from anomalib-tensorrt-cpp.
还有一个问题:设置dynamic_batch_size为8时,但是我的输入图片的batch为4张图,在执行推断时会报错,这个该怎么解决呀?
已经支持了,不过参数有变化,Inference多了一个 dynamic_batch
参数,去除了 dynamic_batch_size
参数,前面参数指定需要动态推理,默认分配最大的batch空间,现在是vector中放入几张图片,就推理几张,变为动态的了
bool dynamic_batch = true; // 使用dynamic_batch,分配最大batch_size显存
from anomalib-tensorrt-cpp.
还有一个问题:设置dynamic_batch_size为8时,但是我的输入图片的batch为4张图,在执行推断时会报错,这个该怎么解决呀?
已经支持了,不过参数有变化,Inference多了一个
dynamic_batch
参数,去除了dynamic_batch_size
参数,前面参数指定需要动态推理,默认分配最大的batch空间,现在是vector中放入几张图片,就推理几张,变为动态的了bool dynamic_batch = true; // 使用dynamic_batch,分配最大batch_size显存
好的,谢谢
from anomalib-tensorrt-cpp.
Related Issues (7)
- 大佬请问我导出的onnx在转engine的时候报reshape的错误,请问有遇到过么 HOT 4
- 请问要是推理单通道的灰度图要怎么改? HOT 4
- patcore 模型结果异常 HOT 2
- 请问efficient_ad模型推理速度为什么那么慢呢 HOT 6
- vs2022 加载model.engine模型,出现溢出 HOT 37
- 创建context报错 HOT 10
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from anomalib-tensorrt-cpp.