Inference on slices of input buffer on GPU device #19666

ktzsh · 2024-02-27T08:40:18Z

ktzsh
Feb 27, 2024

I have a yolov4 onnx model and due to 2 dynamic axis - batch and number of boxes I am unable to do batch inference while single example inference works.

Model Input/Output:

input [
  {
    data_type: TYPE_FP32
    dims: [ -1, 512, 512, 3 ]
  }
]
output [
  {
    data_type: TYPE_FP32
    dims: [ -1, -1, 5 ]
  }
]

So is there a way to copy whole input buffer to device, run inference on slices of it in loop and then copy the output so as to maximize the throughput
Eg. Array [40,512,512,3] -> CopyToGPU -> Loop(inference on [1,512,512,3]) -> CopyOutputToCPU

Manutea · 2024-04-11T09:28:33Z

Manutea
Apr 11, 2024

Hey,

Maybe you can use the binding https://onnxruntime.ai/docs/api/c/struct_ort_api.html#a9a53edebf4ef062a41b0e74f9c6763ec
Copy the data directly on GPU, give the pointer with the offset to your input tensor and run a model with a batch of 1.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference on slices of input buffer on GPU device #19666

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Inference on slices of input buffer on GPU device #19666

ktzsh Feb 27, 2024

Replies: 1 comment

Manutea Apr 11, 2024

ktzsh
Feb 27, 2024

Manutea
Apr 11, 2024