[python/webgpu] faster indexing along multiple dimensions (w/o unnecessary copies) #23217
Unanswered
sluijs
asked this question in
Performance Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm working with large tensors of rank 5 (e.g. w/ shape
[20, 60, 512, 512, 1]
). In the model outlined below, I'm simply trying to index the tensor along multiple dimensions. Executing this in PyTorch-CPU and onnxruntime in Python takes around ~10-30 microseconds. However, executing w/ onnxruntime-web and the WebGPU execution provider this takes ~120 ms using pre-allocated GPU buffers.The following graph seems to indicate that ONNX first indexes the first dim
x[index[0]]
and sequentially indexes the other dims. If I understand correctly, each step would create a copy of the underlying data. Is there a way to prevent these copies from happening, or only a copy of the final slice?Beta Was this translation helpful? Give feedback.
All reactions