感谢博主的贡献！！我想一次推理多张图，该怎么修改Inference？

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

我在导出onnx时使用simplify简化模型，就可以正常转成动态batch的engine了 <div class="snippet-clipboard-conte

<a target="_blank" rel="noopener noreferrer" href="https://private-user-images.githubu

我用的TensorRT-8.4.3.1，context没有 setInputShape ，所以我用的<code

请问使用动态batch该怎么修改代码呀？ about anomalib-tensorrt-cpp HOT 22 CLOSED

fat-921 commented on May 11, 2024

请问使用动态batch该怎么修改代码呀？

from anomalib-tensorrt-cpp.

Comments (22)

fat-921 commented on May 11, 2024 1

我刚刚在主分支添加了判断tensorrt版本的功能，你能帮忙测试一下吗，我这里用8.4.3.1版本有问题，只要在main_dynamic_batch.cpp载入模型就行，其余不用管

OK，我测试的8.4.3.1没问题

from anomalib-tensorrt-cpp.

NagatoYuki0943 commented on May 11, 2024

这个我也不懂，我也在看教程，试着改，改出成果了告诉你

from anomalib-tensorrt-cpp.

NagatoYuki0943 commented on May 11, 2024

@fat-921

已经支持动态batch推理了,不过图片前后处理仍然为顺序处理的

载入模型只需要添加指定batch的代码

        for (int i = 0; i < nbBindings; i++) {
            string name = this->engine->getIOTensorName(i);
            int mode = int(this->engine->getTensorIOMode(name.c_str()));
            nvinfer1::DataType dtype = this->engine->getTensorDataType(name.c_str());
            nvinfer1::Dims dims = this->context->getTensorShape(name.c_str());
            
            // *******************添加这些 *******************
            if ((*dims.d == -1) && (mode == 1)) {
                nvinfer1::Dims minDims = engine->getProfileShape(name.c_str(), 0, nvinfer1::OptProfileSelector::kMIN);
                nvinfer1::Dims optDims = engine->getProfileShape(name.c_str(), 0, nvinfer1::OptProfileSelector::kOPT);
                nvinfer1::Dims maxDims = engine->getProfileShape(name.c_str(), 0, nvinfer1::OptProfileSelector::kMAX);
                // 自己设置的batch必须在最小和最大batch之间
                assert(this->dynamic_batch_size >= minDims.d[0] && this->dynamic_batch_size <= maxDims.d[0]);
                // 显式设置batch
                context->setInputShape(name.c_str(), nvinfer1::Dims4(this->dynamic_batch_size, maxDims.d[1], maxDims.d[2], maxDims.d[3]));
                dims = context->getTensorShape(name.c_str());
            }
            // *******************添加这些 *******************

            int totalSize = volume(dims) * getElementSize(dtype);
            this->bufferSize[i] = totalSize;
            cudaMalloc(&this->cudaBuffers[i], totalSize); // 分配显存空间

            ...
        }

下面为使用方法

前置条件,需要导出的onnx的batch为动态的

方法1(需要重新训练或导出onnx)

在导出代码中添加一行https://github.com/openvinotoolkit/anomalib/blob/main/src/anomalib/deploy/export.py#L155

 torch.onnx.export(
     model.model,
     torch.zeros((1, 3, *input_size)).to(model.device),
     str(onnx_path),
     opset_version=11,
     input_names=["input"],
     output_names=["output"],
     dynamic_axes={"input": {0: "batch_size"}, "output": {0: "batch_size"}}, # add this line to support dynamic batch
 )

方法2(不需要重新训练模型,但不保证成功)

使用 [onnx-modifier](https://github.com/ZhangGe6/onnx-modifier) 将模型的batch调整为动态的

导出engine

# 动态batch,model.onnx的batch为动态的                             input为输入名字, 1, 4, 8要手动指定,256为输出尺寸
trtexec --onnx=model.onnx --saveEngine=model.engine --minShapes=input:1x3x256x256 --optShapes=input:4x3x256x256 --maxShapes=input:8x3x256x256

运行

需要显式指定 dynamic_batch_size，范围在设定的minShapes和maxShapes之间

#include "inference.hpp"
#include <opencv2/opencv.hpp>


int main() {
    // patchcore模型训练配置文件删除了center_crop
    // trtexec --onnx=model.onnx --saveEngine=model.engine 转换模型
    // 动态batch,model.onnx的batch为动态的                             input为输入名字, 1, 4, 8要手动指定
    // trtexec --onnx=model.onnx --saveEngine=model.engine --minShapes=input:1x3x256x256 --optShapes=input:4x3x256x256 --maxShapes=input:8x3x256x256
    string model_path = "D:/ml/code/anomalib/results/efficient_ad/mvtec/bottle/run/weights/openvino/model.engine";
    string meta_path  = "D:/ml/code/anomalib/results/efficient_ad/mvtec/bottle/run/weights/openvino/metadata.json";
    string image_dir = "D:/ml/code/anomalib/datasets/MVTec/bottle/test/broken_large";
    bool efficient_ad = true;   // 是否使用efficient_ad模型
    int dynamic_batch_size = 4; // 显式指定batch,要在最小和最大batch之间

    // 创建推理器
    auto inference = Inference(model_path, meta_path, efficient_ad, dynamic_batch_size);

    // 读取全部图片路径
    vector<cv::String> paths = getImagePaths(image_dir);
    // batch为几就输入几张图片
    vector<cv::Mat> images;
    for (int i = 0; i < dynamic_batch_size; i++) {
        cv::Mat image = cv::imread(paths[i]);
        cv::cvtColor(image, image, cv::ColorConversionCodes::COLOR_BGR2RGB);
        images.push_back(image);
    }

    // 推理
    vector<Result> results = inference.dynamicBatchInfer(images);

    // 查看结果
    for (int i = 0; i < dynamic_batch_size; i++) {
        cout << results[i].score << endl;
        cv::resize(results[i].anomaly_map, results[i].anomaly_map, { 1500, 500 });
        cv::imshow(std::to_string(i), results[i].anomaly_map);
    }

    cv::waitKey(0);

    return 0;
}

from anomalib-tensorrt-cpp.

fat-921 commented on May 11, 2024

感谢大佬的回复！！！
我想使用fastflow，它的onnx模型不支持动态batch的导出，这该怎么解决呢？

from anomalib-tensorrt-cpp.

NagatoYuki0943 commented on May 11, 2024

我在修改了官方的anomalib导出onnx的代码，再训练就能成功导出了，也使用trtexec导出engine，用这个库实现了推理

这是我用的库的版本

numpy                      1.23.5
onnx                         1.14.0
openvino                  2023.0.1
openvino-dev           2023.0.1
openvino-telemetry  2023.1.0
pytorch-lightning     1.6.5
torch                         1.13.1+cu117
torchaudio                0.13.1+cu117
torchmetrics              0.10.3
torchvision                0.14.1+cu117

下面是这个库打印的内容

loading filename from:D:/ml/code/anomalib/results/fastflow/mvtec/bottle/run/weights/openvino/model.engine
[08/22/2023-17:39:46] [I] [TRT] Loaded engine size: 56 MiB
[08/22/2023-17:39:46] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +55, now: CPU 0, GPU 55 (MiB)
deserialize done
[08/22/2023-17:39:46] [I] [TRT] [MS] Running engine with multi stream info
[08/22/2023-17:39:46] [I] [TRT] [MS] Number of aux streams is 2
[08/22/2023-17:39:46] [I] [TRT] [MS] Number of total worker streams is 3
[08/22/2023-17:39:46] [I] [TRT] [MS] The main stream provided by execute/enqueue calls is the first worker stream
[08/22/2023-17:39:46] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +67, now: CPU 0, GPU 122 (MiB)
[08/22/2023-17:39:46] [W] [TRT] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
name: input, mode: 1, dims: [4, 3, 256, 256], totalSize: 3145728
name: output, mode: 2, dims: [4, 1, 256, 256], totalSize: 1048576
warm up finish
0.681545
0.691419
0.669558
0.691138

from anomalib-tensorrt-cpp.

fat-921 commented on May 11, 2024

我在导出onnx时使用simplify简化模型，就可以正常转成动态batch的engine了

    onnx_path = export_path / "model.onnx"
    torch.onnx.export(
        model.model,
        torch.zeros((1, 3, *input_size)).to(model.device),
        str(onnx_path),
        opset_version=11,
        input_names=["input"],
        output_names=["output"],
        dynamic_axes={"input": {0: "batch_size"}},  # add this line to support dynamic batch
    )

    # 简化模型simplify
    model_ = onnx.load(onnx_path)
    model_simp, check = simplify(model_)
    assert check, "Simplified ONNX model could not be validated"
    onnx.save(model_simp, onnx_path)

但是我在使用C++创建模型时遇到了另一个问题：
在mode == 1时也就是输入节点为"input"，设置了最小的batch，dims维度是{4, 3, 256, 256}
那当mode == 2也就是输出节点"output"，dims的维度是{-1, 1, 256, 256}
所以，这个输出的内存分配是不是应该跟输入一样呢？？

from anomalib-tensorrt-cpp.

NagatoYuki0943 commented on May 11, 2024

context并没有setOutputShape方法
下面是我的测试，不同的batchsize虽然只设定的Input的形状，但是Output的batch也随着Input的batch变化也变化了

# batch_size = 2
name: input, mode: 1, dims: [2, 3, 256, 256], totalSize: 1572864
name: onnx::Mul_276, mode: 2, dims: [2, 1, 256, 256], totalSize: 524288
name: onnx::Mul_281, mode: 2, dims: [2, 1, 256, 256], totalSize: 524288
name: output, mode: 2, dims: [2, 1, 256, 256], totalSize: 524288

# batch_size = 4
name: input, mode: 1, dims: [4, 3, 256, 256], totalSize: 3145728
name: onnx::Mul_276, mode: 2, dims: [4, 1, 256, 256], totalSize: 1048576
name: onnx::Mul_281, mode: 2, dims: [4, 1, 256, 256], totalSize: 1048576
name: output, mode: 2, dims: [4, 1, 256, 256], totalSize: 1048576

from anomalib-tensorrt-cpp.

fat-921 commented on May 11, 2024

z这应该跟我转出的模型有关，我在执行第127行时输入输出节点的第一个维度都是-1

from anomalib-tensorrt-cpp.

NagatoYuki0943 commented on May 11, 2024

我用fastflow测试，Input的batch为-1，后面Output就为设定到batch了
我用的tensorrt是8.6.1

from anomalib-tensorrt-cpp.

fat-921 commented on May 11, 2024

我用的TensorRT-8.4.3.1，context没有setInputShape，所以我用的setBindingDimensions替代了，不知道是不是这个导致输出batch的维度没有变化

from anomalib-tensorrt-cpp.

NagatoYuki0943 commented on May 11, 2024

那可能也需要使用setBindingDimensions设定输出形状为 [dynamic_batch_size, 1, 256, 256],你试试添加下面这些代码这个把输出的batch也绑定试试
不过这样设置会导致patchcore没法用，因为它有2个输出，1个输出为[batch, 1, 256, 256]，另一个输出为[batch]，要专门设置

            if ((*dims.d == -1) && (name != "input")) {
                int nInputIdx = this->engine->getBindingIndex(name.c_str());
                nvinfer1::Dims maxDims = this->engine->getProfileDimensions(nInputIdx, 0, nvinfer1::OptProfileSelector::kMAX);
                // 显式设置batch
                this->context->setBindingDimensions(nInputIdx, nvinfer1::Dims4(this->dynamic_batch_size, 1, maxDims.d[2], maxDims.d[3]));
                dims = this->context->getBindingDimensions(nInputIdx);
            }

from anomalib-tensorrt-cpp.

fat-921 commented on May 11, 2024

那可能也需要使用setBindingDimensions设定输出形状为 [dynamic_batch_size, 1, 256, 256],你试试添加下面这些代码这个把输出的batch也绑定试试

            if ((*dims.d == -1) && (name != "input")) {
                int nInputIdx = this->engine->getBindingIndex(name.c_str());
                nvinfer1::Dims maxDims = this->engine->getProfileDimensions(nInputIdx, 0, nvinfer1::OptProfileSelector::kMAX);
                // 显式设置batch
                this->context->setBindingDimensions(nInputIdx, nvinfer1::Dims4(this->dynamic_batch_size, 1, maxDims.d[2], maxDims.d[3]));
                dims = this->context->getBindingDimensions(nInputIdx);
            }

应该是不可以这样操作的，会报错：
[08/23/2023-15:17:06] [I] [TRT] [MemUsageChange] Init CUDA: CPU +499, GPU +0, now: CPU 18546, GPU 1205 (MiB)
[08/23/2023-15:17:06] [I] [TRT] Loaded engine size: 56 MiB
[08/23/2023-15:17:06] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
deserialize done
[08/23/2023-15:17:06] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[08/23/2023-15:17:24] [E] [TRT] 3: [executionContext.cpp::nvinfer1::rt::ExecutionContext::setBindingDimensions::954] Error Code 3: API Usage Error (Parameter check failed at: executionContext.cpp::nvinfer1::rt::ExecutionContext::setBindingDimensions::954, condition: mEngine.bindingIsInput(bindingIndex)

from anomalib-tensorrt-cpp.

NagatoYuki0943 commented on May 11, 2024

C++的推理我是参照python的例子修改来的，python写的tensorrt的例子在8.5以前的版本也没有设定output的shape，这是我自己的例子 https://github.com/NagatoYuki0943/dl-infer-learn/blob/main/python/tensorrt.ipynb ，这是官方的例子 https://github.com/NVIDIA/TensorRT/blob/main/samples/python/efficientdet/infer.py ，都只设定了Input shape，我下载8.4.3.1试试

from anomalib-tensorrt-cpp.

fat-921 commented on May 11, 2024

把这两段去掉，只设置输出的维度

                //nvinfer1::Dims maxDims = this->engine->getProfileDimensions(nIdx, 0, nvinfer1::OptProfileSelector::kMAX);
                // 显式设置batch
                //this->context->setBindingDimensions(nIdx, nvinfer1::Dims4(this->dynamic_batch_size, 1, dims.d[2], dims.d[3]));

from anomalib-tensorrt-cpp.

NagatoYuki0943 commented on May 11, 2024

是已经成功了吗？

from anomalib-tensorrt-cpp.

NagatoYuki0943 commented on May 11, 2024

我创了一个新分支https://github.com/NagatoYuki0943/anomalib-tensorrt-cpp/tree/tensorrt8.4.3.1 ，你可以试试

from anomalib-tensorrt-cpp.

fat-921 commented on May 11, 2024

是已经成功了吗？

嗯嗯，感谢大佬的帮助！！笔芯

from anomalib-tensorrt-cpp.

NagatoYuki0943 commented on May 11, 2024

我刚刚在主分支添加了判断tensorrt版本的功能，你能帮忙测试一下吗，我这里用8.4.3.1版本有问题，只要在main_dynamic_batch.cpp载入模型就行，其余不用管

from anomalib-tensorrt-cpp.

fat-921 commented on May 11, 2024

还有一个问题：设置dynamic_batch_size为8时，但是我的输入图片的batch为4张图，在执行推断时会报错，这个该怎么解决呀？

from anomalib-tensorrt-cpp.

NagatoYuki0943 commented on May 11, 2024

这个我也没解决，现在是batch为几，就输入几张图片，在python版本是可以用大的batchsize，输入少量图片的 https://github.com/NagatoYuki0943/dl-infer-learn/blob/main/python/tensorrt.ipynb 这里就有例子，不过C++版本不行，我查查吧

from anomalib-tensorrt-cpp.

NagatoYuki0943 commented on May 11, 2024

还有一个问题：设置dynamic_batch_size为8时，但是我的输入图片的batch为4张图，在执行推断时会报错，这个该怎么解决呀？

已经支持了，不过参数有变化，Inference多了一个 dynamic_batch 参数，去除了 dynamic_batch_size 参数，前面参数指定需要动态推理，默认分配最大的batch空间，现在是vector中放入几张图片，就推理几张，变为动态的了

bool dynamic_batch = true;   // 使用dynamic_batch,分配最大batch_size显存

from anomalib-tensorrt-cpp.

fat-921 commented on May 11, 2024

还有一个问题：设置dynamic_batch_size为8时，但是我的输入图片的batch为4张图，在执行推断时会报错，这个该怎么解决呀？

已经支持了，不过参数有变化，Inference多了一个 dynamic_batch 参数，去除了 dynamic_batch_size 参数，前面参数指定需要动态推理，默认分配最大的batch空间，现在是vector中放入几张图片，就推理几张，变为动态的了
bool dynamic_batch = true;   // 使用dynamic_batch,分配最大batch_size显存

好的，谢谢

from anomalib-tensorrt-cpp.

请问使用动态batch该怎么修改代码呀？ about anomalib-tensorrt-cpp HOT 22 CLOSED

Comments (22)

已经支持动态batch推理了,不过图片前后处理仍然为顺序处理的

下面为使用方法

前置条件,需要导出的onnx的batch为动态的

导出engine

运行

Related Issues (7)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent