Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metal] deferred codegen does not lower thread id intrinsics #624

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

tgymnich
Copy link
Member

@tgymnich tgymnich commented Sep 3, 2024

We currently handle our fake intrinsics for thread id etc. (add_input_arguments!) on finish_module if we are dealing with a kernel (job.config.kernel = true). This all happens before deferred codegen and linking those deferred modules.

Now if our kernel contains deferred codegen jobs which are not kernels (job.config.kernel = false) (using Enzyme for example) we end up with julia.air.thread_position_in_grid.i32 in our main kernel after linking. Which will not be removed since finish_module already ran before. The deferred code is not a kernel and thus never ran add_input_arguments! during its finish_module.

This PR solves this issue by calling add_input_arguments! in finish_ir which is called after linking.

cc @wsmoses

Other Solutions / Discussion

  • Have Enzyme create deferred codegen jobs for kernels with the kernel flag set to true
  • Run add_input_arguments during IR post-processing

@tgymnich tgymnich changed the title [Metal] deferred codegen does not work [Metal] deferred codegen does not lower thread id intrinsics Sep 3, 2024
@tgymnich tgymnich marked this pull request as ready for review September 3, 2024 11:42
@tgymnich tgymnich requested a review from maleadt September 3, 2024 11:48
@maleadt
Copy link
Member

maleadt commented Sep 3, 2024

I don't like the pass being invoked twice. Only calling it in finish_ir! may regress code quality though, as a position index coming from an argument is quite different than one coming from a function call (i.e., in a way that would affect optimization). Which is now already the case for Enzyme.jl-generated code, apparently.

@tgymnich
Copy link
Member Author

tgymnich commented Sep 3, 2024

As an in-between solution we could call add_input_arguments for non-kernels in finish_module

Or alternatively: Conditionally perform add_input_arguments after linking if the parent is a kernel.

@tgymnich tgymnich marked this pull request as draft September 6, 2024 22:24
@vchuravy
Copy link
Member

Hopefully the PR series #582 #633 and #634 will also fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants