[python/webgpu] faster indexing along multiple dimensions (w/o unnecessary copies) #23217

sluijs · 2024-12-28T20:38:00Z

sluijs
Dec 28, 2024

I'm working with large tensors of rank 5 (e.g. w/ shape [20, 60, 512, 512, 1]). In the model outlined below, I'm simply trying to index the tensor along multiple dimensions. Executing this in PyTorch-CPU and onnxruntime in Python takes around ~10-30 microseconds. However, executing w/ onnxruntime-web and the WebGPU execution provider this takes ~120 ms using pre-allocated GPU buffers.

class Model(nn.Module):
    def forward(self, x: torch.Tensor, index: torch.Tensor) -> torch.Tensor:
        return x[index[0], index[1], :, :, index[4]]

The following graph seems to indicate that ONNX first indexes the first dim x[index[0]] and sequentially indexes the other dims. If I understand correctly, each step would create a copy of the underlying data. Is there a way to prevent these copies from happening, or only a copy of the final slice?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python/webgpu] faster indexing along multiple dimensions (w/o unnecessary copies) #23217

{{title}}

Replies: 0 comments

Select a reply

[python/webgpu] faster indexing along multiple dimensions (w/o unnecessary copies) #23217

sluijs Dec 28, 2024

Replies: 0 comments

sluijs
Dec 28, 2024