Code Monkey home page Code Monkey logo

Comments (22)

fat-921 avatar fat-921 commented on May 11, 2024 1

我刚刚在主分支添加了判断tensorrt版本的功能,你能帮忙测试一下吗,我这里用8.4.3.1版本有问题,只要在main_dynamic_batch.cpp载入模型就行,其余不用管

OK,我测试的8.4.3.1没问题

from anomalib-tensorrt-cpp.

NagatoYuki0943 avatar NagatoYuki0943 commented on May 11, 2024

这个我也不懂,我也在看教程,试着改,改出成果了告诉你

from anomalib-tensorrt-cpp.

NagatoYuki0943 avatar NagatoYuki0943 commented on May 11, 2024

@fat-921

已经支持动态batch推理了,不过图片前后处理仍然为顺序处理的

载入模型只需要添加指定batch的代码

        for (int i = 0; i < nbBindings; i++) {
            string name = this->engine->getIOTensorName(i);
            int mode = int(this->engine->getTensorIOMode(name.c_str()));
            nvinfer1::DataType dtype = this->engine->getTensorDataType(name.c_str());
            nvinfer1::Dims dims = this->context->getTensorShape(name.c_str());
            
            // *******************添加这些 *******************
            if ((*dims.d == -1) && (mode == 1)) {
                nvinfer1::Dims minDims = engine->getProfileShape(name.c_str(), 0, nvinfer1::OptProfileSelector::kMIN);
                nvinfer1::Dims optDims = engine->getProfileShape(name.c_str(), 0, nvinfer1::OptProfileSelector::kOPT);
                nvinfer1::Dims maxDims = engine->getProfileShape(name.c_str(), 0, nvinfer1::OptProfileSelector::kMAX);
                // 自己设置的batch必须在最小和最大batch之间
                assert(this->dynamic_batch_size >= minDims.d[0] && this->dynamic_batch_size <= maxDims.d[0]);
                // 显式设置batch
                context->setInputShape(name.c_str(), nvinfer1::Dims4(this->dynamic_batch_size, maxDims.d[1], maxDims.d[2], maxDims.d[3]));
                dims = context->getTensorShape(name.c_str());
            }
            // *******************添加这些 *******************

            int totalSize = volume(dims) * getElementSize(dtype);
            this->bufferSize[i] = totalSize;
            cudaMalloc(&this->cudaBuffers[i], totalSize); // 分配显存空间

            ...
        }

下面为使用方法

前置条件,需要导出的onnx的batch为动态的

  1. 方法1(需要重新训练或导出onnx)

​ 在导出代码中添加一行https://github.com/openvinotoolkit/anomalib/blob/main/src/anomalib/deploy/export.py#L155

 torch.onnx.export(
     model.model,
     torch.zeros((1, 3, *input_size)).to(model.device),
     str(onnx_path),
     opset_version=11,
     input_names=["input"],
     output_names=["output"],
     dynamic_axes={"input": {0: "batch_size"}, "output": {0: "batch_size"}}, # add this line to support dynamic batch
 )
  1. 方法2(不需要重新训练模型,但不保证成功)

使用 [onnx-modifier](https://github.com/ZhangGe6/onnx-modifier) 将模型的batch调整为动态的

导出engine

# 动态batch,model.onnx的batch为动态的                             input为输入名字, 1, 4, 8要手动指定,256为输出尺寸
trtexec --onnx=model.onnx --saveEngine=model.engine --minShapes=input:1x3x256x256 --optShapes=input:4x3x256x256 --maxShapes=input:8x3x256x256

运行

需要显式指定 dynamic_batch_size,范围在设定的minShapesmaxShapes之间

#include "inference.hpp"
#include <opencv2/opencv.hpp>


int main() {
    // patchcore模型训练配置文件删除了center_crop
    // trtexec --onnx=model.onnx --saveEngine=model.engine 转换模型
    // 动态batch,model.onnx的batch为动态的                             input为输入名字, 1, 4, 8要手动指定
    // trtexec --onnx=model.onnx --saveEngine=model.engine --minShapes=input:1x3x256x256 --optShapes=input:4x3x256x256 --maxShapes=input:8x3x256x256
    string model_path = "D:/ml/code/anomalib/results/efficient_ad/mvtec/bottle/run/weights/openvino/model.engine";
    string meta_path  = "D:/ml/code/anomalib/results/efficient_ad/mvtec/bottle/run/weights/openvino/metadata.json";
    string image_dir = "D:/ml/code/anomalib/datasets/MVTec/bottle/test/broken_large";
    bool efficient_ad = true;   // 是否使用efficient_ad模型
    int dynamic_batch_size = 4; // 显式指定batch,要在最小和最大batch之间

    // 创建推理器
    auto inference = Inference(model_path, meta_path, efficient_ad, dynamic_batch_size);

    // 读取全部图片路径
    vector<cv::String> paths = getImagePaths(image_dir);
    // batch为几就输入几张图片
    vector<cv::Mat> images;
    for (int i = 0; i < dynamic_batch_size; i++) {
        cv::Mat image = cv::imread(paths[i]);
        cv::cvtColor(image, image, cv::ColorConversionCodes::COLOR_BGR2RGB);
        images.push_back(image);
    }

    // 推理
    vector<Result> results = inference.dynamicBatchInfer(images);

    // 查看结果
    for (int i = 0; i < dynamic_batch_size; i++) {
        cout << results[i].score << endl;
        cv::resize(results[i].anomaly_map, results[i].anomaly_map, { 1500, 500 });
        cv::imshow(std::to_string(i), results[i].anomaly_map);
    }

    cv::waitKey(0);

    return 0;
}

from anomalib-tensorrt-cpp.

fat-921 avatar fat-921 commented on May 11, 2024

感谢大佬的回复!!!
我想使用fastflow,它的onnx模型不支持动态batch的导出,这该怎么解决呢?

from anomalib-tensorrt-cpp.

NagatoYuki0943 avatar NagatoYuki0943 commented on May 11, 2024

我在修改了官方的anomalib导出onnx的代码,再训练就能成功导出了,也使用trtexec导出engine,用这个库实现了推理

这是我用的库的版本

numpy                      1.23.5
onnx                         1.14.0
openvino                  2023.0.1
openvino-dev           2023.0.1
openvino-telemetry  2023.1.0
pytorch-lightning     1.6.5
torch                         1.13.1+cu117
torchaudio                0.13.1+cu117
torchmetrics              0.10.3
torchvision                0.14.1+cu117

下面是这个库打印的内容

loading filename from:D:/ml/code/anomalib/results/fastflow/mvtec/bottle/run/weights/openvino/model.engine
[08/22/2023-17:39:46] [I] [TRT] Loaded engine size: 56 MiB
[08/22/2023-17:39:46] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +55, now: CPU 0, GPU 55 (MiB)
deserialize done
[08/22/2023-17:39:46] [I] [TRT] [MS] Running engine with multi stream info
[08/22/2023-17:39:46] [I] [TRT] [MS] Number of aux streams is 2
[08/22/2023-17:39:46] [I] [TRT] [MS] Number of total worker streams is 3
[08/22/2023-17:39:46] [I] [TRT] [MS] The main stream provided by execute/enqueue calls is the first worker stream
[08/22/2023-17:39:46] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +67, now: CPU 0, GPU 122 (MiB)
[08/22/2023-17:39:46] [W] [TRT] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
name: input, mode: 1, dims: [4, 3, 256, 256], totalSize: 3145728
name: output, mode: 2, dims: [4, 1, 256, 256], totalSize: 1048576
warm up finish
0.681545
0.691419
0.669558
0.691138

from anomalib-tensorrt-cpp.

fat-921 avatar fat-921 commented on May 11, 2024

我在导出onnx时使用simplify简化模型,就可以正常转成动态batch的engine了

    onnx_path = export_path / "model.onnx"
    torch.onnx.export(
        model.model,
        torch.zeros((1, 3, *input_size)).to(model.device),
        str(onnx_path),
        opset_version=11,
        input_names=["input"],
        output_names=["output"],
        dynamic_axes={"input": {0: "batch_size"}},  # add this line to support dynamic batch
    )

    # 简化模型simplify
    model_ = onnx.load(onnx_path)
    model_simp, check = simplify(model_)
    assert check, "Simplified ONNX model could not be validated"
    onnx.save(model_simp, onnx_path)

但是我在使用C++创建模型时遇到了另一个问题:
mode == 1时也就是输入节点为"input",设置了最小的batch,dims维度是{4, 3, 256, 256}
那当mode == 2也就是输出节点"output",dims的维度是{-1, 1, 256, 256}
所以,这个输出的内存分配是不是应该跟输入一样呢??

from anomalib-tensorrt-cpp.

NagatoYuki0943 avatar NagatoYuki0943 commented on May 11, 2024

context并没有setOutputShape方法
下面是我的测试,不同的batchsize虽然只设定的Input的形状,但是Output的batch也随着Input的batch变化也变化了

# batch_size = 2
name: input, mode: 1, dims: [2, 3, 256, 256], totalSize: 1572864
name: onnx::Mul_276, mode: 2, dims: [2, 1, 256, 256], totalSize: 524288
name: onnx::Mul_281, mode: 2, dims: [2, 1, 256, 256], totalSize: 524288
name: output, mode: 2, dims: [2, 1, 256, 256], totalSize: 524288

# batch_size = 4
name: input, mode: 1, dims: [4, 3, 256, 256], totalSize: 3145728
name: onnx::Mul_276, mode: 2, dims: [4, 1, 256, 256], totalSize: 1048576
name: onnx::Mul_281, mode: 2, dims: [4, 1, 256, 256], totalSize: 1048576
name: output, mode: 2, dims: [4, 1, 256, 256], totalSize: 1048576

from anomalib-tensorrt-cpp.

fat-921 avatar fat-921 commented on May 11, 2024

image
z这应该跟我转出的模型有关,我在执行第127行时输入输出节点的第一个维度都是-1

from anomalib-tensorrt-cpp.

NagatoYuki0943 avatar NagatoYuki0943 commented on May 11, 2024

我用fastflow测试,Input的batch为-1,后面Output就为设定到batch了
我用的tensorrt是8.6.1
@IVAM3SU)ZN2H11 Y5UTQJW

from anomalib-tensorrt-cpp.

fat-921 avatar fat-921 commented on May 11, 2024

我用的TensorRT-8.4.3.1,context没有setInputShape,所以我用的setBindingDimensions替代了,不知道是不是这个导致输出batch的维度没有变化
image

from anomalib-tensorrt-cpp.

NagatoYuki0943 avatar NagatoYuki0943 commented on May 11, 2024

那可能也需要使用setBindingDimensions设定输出形状为 [dynamic_batch_size, 1, 256, 256],你试试添加下面这些代码这个把输出的batch也绑定试试
不过这样设置会导致patchcore没法用,因为它有2个输出,1个输出为[batch, 1, 256, 256],另一个输出为[batch],要专门设置

            if ((*dims.d == -1) && (name != "input")) {
                int nInputIdx = this->engine->getBindingIndex(name.c_str());
                nvinfer1::Dims maxDims = this->engine->getProfileDimensions(nInputIdx, 0, nvinfer1::OptProfileSelector::kMAX);
                // 显式设置batch
                this->context->setBindingDimensions(nInputIdx, nvinfer1::Dims4(this->dynamic_batch_size, 1, maxDims.d[2], maxDims.d[3]));
                dims = this->context->getBindingDimensions(nInputIdx);
            }

from anomalib-tensorrt-cpp.

fat-921 avatar fat-921 commented on May 11, 2024

那可能也需要使用setBindingDimensions设定输出形状为 [dynamic_batch_size, 1, 256, 256],你试试添加下面这些代码这个把输出的batch也绑定试试

            if ((*dims.d == -1) && (name != "input")) {
                int nInputIdx = this->engine->getBindingIndex(name.c_str());
                nvinfer1::Dims maxDims = this->engine->getProfileDimensions(nInputIdx, 0, nvinfer1::OptProfileSelector::kMAX);
                // 显式设置batch
                this->context->setBindingDimensions(nInputIdx, nvinfer1::Dims4(this->dynamic_batch_size, 1, maxDims.d[2], maxDims.d[3]));
                dims = this->context->getBindingDimensions(nInputIdx);
            }

应该是不可以这样操作的,会报错:
[08/23/2023-15:17:06] [I] [TRT] [MemUsageChange] Init CUDA: CPU +499, GPU +0, now: CPU 18546, GPU 1205 (MiB)
[08/23/2023-15:17:06] [I] [TRT] Loaded engine size: 56 MiB
[08/23/2023-15:17:06] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
deserialize done
[08/23/2023-15:17:06] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[08/23/2023-15:17:24] [E] [TRT] 3: [executionContext.cpp::nvinfer1::rt::ExecutionContext::setBindingDimensions::954] Error Code 3: API Usage Error (Parameter check failed at: executionContext.cpp::nvinfer1::rt::ExecutionContext::setBindingDimensions::954, condition: mEngine.bindingIsInput(bindingIndex)

from anomalib-tensorrt-cpp.

NagatoYuki0943 avatar NagatoYuki0943 commented on May 11, 2024

C++的推理我是参照python的例子修改来的,python写的tensorrt的例子在8.5以前的版本也没有设定output的shape,这是我自己的例子 https://github.com/NagatoYuki0943/dl-infer-learn/blob/main/python/tensorrt.ipynb ,这是官方的例子 https://github.com/NVIDIA/TensorRT/blob/main/samples/python/efficientdet/infer.py ,都只设定了Input shape,我下载8.4.3.1试试

from anomalib-tensorrt-cpp.

fat-921 avatar fat-921 commented on May 11, 2024

把这两段去掉,只设置输出的维度

                //nvinfer1::Dims maxDims = this->engine->getProfileDimensions(nIdx, 0, nvinfer1::OptProfileSelector::kMAX);
                // 显式设置batch
                //this->context->setBindingDimensions(nIdx, nvinfer1::Dims4(this->dynamic_batch_size, 1, dims.d[2], dims.d[3]));

from anomalib-tensorrt-cpp.

NagatoYuki0943 avatar NagatoYuki0943 commented on May 11, 2024

是已经成功了吗?

from anomalib-tensorrt-cpp.

NagatoYuki0943 avatar NagatoYuki0943 commented on May 11, 2024

我创了一个新分支https://github.com/NagatoYuki0943/anomalib-tensorrt-cpp/tree/tensorrt8.4.3.1 ,你可以试试

from anomalib-tensorrt-cpp.

fat-921 avatar fat-921 commented on May 11, 2024

是已经成功了吗?

嗯嗯,感谢大佬的帮助!!笔芯

from anomalib-tensorrt-cpp.

NagatoYuki0943 avatar NagatoYuki0943 commented on May 11, 2024

我刚刚在主分支添加了判断tensorrt版本的功能,你能帮忙测试一下吗,我这里用8.4.3.1版本有问题,只要在main_dynamic_batch.cpp载入模型就行,其余不用管

from anomalib-tensorrt-cpp.

fat-921 avatar fat-921 commented on May 11, 2024

还有一个问题:设置dynamic_batch_size为8时,但是我的输入图片的batch为4张图,在执行推断时会报错,这个该怎么解决呀?

from anomalib-tensorrt-cpp.

NagatoYuki0943 avatar NagatoYuki0943 commented on May 11, 2024

这个我也没解决,现在是batch为几,就输入几张图片,在python版本是可以用大的batchsize,输入少量图片的 https://github.com/NagatoYuki0943/dl-infer-learn/blob/main/python/tensorrt.ipynb 这里就有例子,不过C++版本不行,我查查吧

from anomalib-tensorrt-cpp.

NagatoYuki0943 avatar NagatoYuki0943 commented on May 11, 2024

还有一个问题:设置dynamic_batch_size为8时,但是我的输入图片的batch为4张图,在执行推断时会报错,这个该怎么解决呀?

已经支持了,不过参数有变化,Inference多了一个 dynamic_batch 参数,去除了 dynamic_batch_size 参数,前面参数指定需要动态推理,默认分配最大的batch空间,现在是vector中放入几张图片,就推理几张,变为动态的了

bool dynamic_batch = true;   // 使用dynamic_batch,分配最大batch_size显存

from anomalib-tensorrt-cpp.

fat-921 avatar fat-921 commented on May 11, 2024

还有一个问题:设置dynamic_batch_size为8时,但是我的输入图片的batch为4张图,在执行推断时会报错,这个该怎么解决呀?

已经支持了,不过参数有变化,Inference多了一个 dynamic_batch 参数,去除了 dynamic_batch_size 参数,前面参数指定需要动态推理,默认分配最大的batch空间,现在是vector中放入几张图片,就推理几张,变为动态的了

bool dynamic_batch = true;   // 使用dynamic_batch,分配最大batch_size显存

好的,谢谢

from anomalib-tensorrt-cpp.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.