You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When writing code for GPU's, there are typically 2 compilation models: Completely separate compilation, where the GPU parts of the code base are compiled separately through the build system, and "on-demand", where the GPU code is compiled as part of the same compiler invocation. Some examples of the former category are languages like GLSL, HLSL, Metal, and OpenCL. The latter include CUDA, HIP, and SYCL. One big advantage of the "on-demand" method is something that I would like to bring to Zig: Being able to make kernels generic. When writing libraries for GPU-accelerated code, its not uncommon to provide for example a generic GPU accelerated sorting routine, and allowing the user to specify a custom comparison operator. Because this sorting routine requires both GPU and CPU (mainly driving the GPU) parts, separate compilation makes these libraries very annoying to create.
I'm wondering if this appropriate for Zig at all, given the general "no-magic" stance on these types of things, and what this could look like. I don't really like the language-assisted ways that CUDA/HIP/SYCL kernels are launched (special syntax that calls into 3rd-party library). For most GPU runtimes, the code must still be built in some specialized way best expressed through the build system, and I'm not really sure how this would all fit together. I feel like there is some overlap with the function multi-versioning proposal (#1018).
I've previously written a proof-of-concept userspace implementation. This works by translating "kernel calls" into extern function calls, which are then examined and demangled to form a list of functions that must be compiled in the next step. This next step works by for each "exported" (through a custom function) entry point, checking (via build options) whether it should be exported and if so with which generic types. I guess that works to some extent, but I think it will be error prone with more complex library set ups. Perhaps one way is to generate such a list of entry points via the compiler itself, and then pass that to the next step via the build system?
The text was updated successfully, but these errors were encountered:
When writing code for GPU's, there are typically 2 compilation models: Completely separate compilation, where the GPU parts of the code base are compiled separately through the build system, and "on-demand", where the GPU code is compiled as part of the same compiler invocation. Some examples of the former category are languages like GLSL, HLSL, Metal, and OpenCL. The latter include CUDA, HIP, and SYCL. One big advantage of the "on-demand" method is something that I would like to bring to Zig: Being able to make kernels generic. When writing libraries for GPU-accelerated code, its not uncommon to provide for example a generic GPU accelerated sorting routine, and allowing the user to specify a custom comparison operator. Because this sorting routine requires both GPU and CPU (mainly driving the GPU) parts, separate compilation makes these libraries very annoying to create.
I'm wondering if this appropriate for Zig at all, given the general "no-magic" stance on these types of things, and what this could look like. I don't really like the language-assisted ways that CUDA/HIP/SYCL kernels are launched (special syntax that calls into 3rd-party library). For most GPU runtimes, the code must still be built in some specialized way best expressed through the build system, and I'm not really sure how this would all fit together. I feel like there is some overlap with the function multi-versioning proposal (#1018).
I've previously written a proof-of-concept userspace implementation. This works by translating "kernel calls" into extern function calls, which are then examined and demangled to form a list of functions that must be compiled in the next step. This next step works by for each "exported" (through a custom function) entry point, checking (via build options) whether it should be exported and if so with which generic types. I guess that works to some extent, but I think it will be error prone with more complex library set ups. Perhaps one way is to generate such a list of entry points via the compiler itself, and then pass that to the next step via the build system?
The text was updated successfully, but these errors were encountered: