The depthanythingtensorrtdeploy from thinvy

TensorRT加速效果并不明显

感谢您的优秀工作！
最近我在尝试在Jetson Orign NX上使用TensorRT对Depth Anything进行加速，但是我发现转换后的trt文件的推理速度和onnx文件相比并没有显著提升，甚至还有下降。其中:

ONNX Inference Time: 2.7s per image

TRT Inference Time: 3.0s per image

库的版本如下：

- JetPack: 5.1
- CUDA: 11.4.315
- cuDNN: 8.6.0.166
- TensorRT: 8.5.2.2
- VPI: 2.2.4
- Vulkan: 1.3.204
- OpenCV: 4.5.4 - with CUDA: NO
- torch: 2.1.0
- torchvision: 0.16.0
- onnx: 1.16.1
- onnxruntime: 1.8.0

将pth文件转换成onnx文件的函数如下:

model_name = "zoedepth"
pretrained_resource = "local::./checkpoints/ZoeDepthIndoor_05-Jun_15-11-ebbebc6c1002_best.pt"
dataset = None
overwrite = {"pretrained_resource": pretrained_resource}
config = get_config(model_name, "eval", dataset, **overwrite)
model = build_model(config)
model.eval() 
dummy_input = torch.randn(1, 3, 392, 518)
 _ = model(dummy_input)
torch.onnx.export(model, dummy_input, "ZoeDepth_indoor.onnx", verbose=True)
torch.onnx.export(
            model,
             dummy_input, 
             "./checkpoints/ZoeDepth_indoor_jetson.onnx", 
             opset_version=11, 
             input_names=["input"], 
             output_names=["output"], 
)

将onnx文件转换成trt文件的函数如下:

def build_engine(onnx_file_path):
    onnx_file_path = Path(onnx_file_path)
    # ONNX to TensorRT
    logger = trt.Logger(trt.Logger.VERBOSE)
    builder = trt.Builder(logger)
    network = builder.create_network(1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
    parser = trt.OnnxParser(network, logger)

    with open(onnx_file_path, "rb") as model:
        if not parser.parse(model.read()):
            for error in range(parser.num_errors):
                print(parser.get_error(error))
            raise ValueError('Faled to parse the ONNX model.')

    # Set up the builder config
    config = builder.create_builder_config()
    config.set_flag(trt.BuilderFlag.FP16)  # FP16
    config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 2 << 30)  # 2 GB

    serialized_engine = builder.build_serialized_network(network, config)

    with open(onnx_file_path.with_suffix(".trt"), "wb") as f:
        f.write(serialized_engine)

使用trt文件进行推理的函数如下:

def infer_trt(engine, input_image):
    input_image = input_image.cpu().numpy().astype(np.float32)
    context = engine.create_execution_context()
    height, width = input_image.shape[2], input_image.shape[3]
    output_shape = (1, 1, height, width)
    # Allocate pagelocked memory
    h_input = cuda.pagelocked_empty(trt.volume((1, 3, height, width)), dtype=np.float32)
    h_output = cuda.pagelocked_empty(trt.volume((1, 1, height, width)), dtype=np.float32)

    # Allocate device memory
    d_input = cuda.mem_alloc(h_input.nbytes)
    d_output = cuda.mem_alloc(h_output.nbytes)

    bindings = [int(d_input), int(d_output)]
    stream = cuda.Stream()
    # Function to perform inference
    def perform_inference(images_np):
        np.copyto(h_input, images_np.ravel())
        cuda.memcpy_htod_async(d_input, h_input, stream)
        context.execute_async_v2(bindings=bindings, stream_handle=stream.handle)
        cuda.memcpy_dtoh_async(h_output, d_output, stream)
        stream.synchronize()
        return torch.tensor(h_output).view(output_shape)
        # Run inference on original images

    pred1 = perform_inference(input_image)

    # Run inference on flipped images
    flipped_images_np = np.flip(input_image, axis=3)
    pred2 = perform_inference(flipped_images_np)
    pred2 = torch.flip(pred2, [3])
    mean_pred = 0.5 * (pred1 + pred2)
    return mean_pred

代码运行过程中除了转换成onnx文件的时候会有一些warning，其他全部正常运行。但是最后的结果还是不尽如人意，期待得到您的回复！

thinvy / depthanythingtensorrtdeploy Goto Github PK

depthanythingtensorrtdeploy's People

Contributors

Stargazers

Watchers

Forkers

depthanythingtensorrtdeploy's Issues

有机会出绝对深度版本的Depth anything tensorrt 加速的程式码吗？

TensorRT加速效果并不明显

我想在jetson agx上运行您的 tensorrt 版本的depth anything？应该怎么做啊？tensorrt 不是8.6版本

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent