Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Enzyme Forward mode custom rule #1869

Merged
merged 3 commits into from
May 7, 2024
Merged

Conversation

wsmoses
Copy link
Contributor

@wsmoses wsmoses commented Apr 16, 2023

Wasn't sure how to integrate the sample test file into your infrastructure.

@maleadt
Copy link
Member

maleadt commented Apr 16, 2023

Putting this in ext makes me think this is an extension package, yet you're including the code all the time?

@wsmoses
Copy link
Contributor Author

wsmoses commented Apr 17, 2023

Yeah it should be an extension package, but I was having difficulties getting that set up. Help is welcome!

@maleadt maleadt added help wanted Extra attention is needed upstream Somebody else's problem. labels Apr 17, 2023
@maleadt maleadt marked this pull request as draft April 17, 2023 17:55
@maleadt
Copy link
Member

maleadt commented Apr 17, 2023

Don't have experience with that.
wrt. the test infrastructure, just dropping a file in test/ is enough.

@vchuravy vchuravy self-assigned this Apr 17, 2023
ext/EnzymeExt.jl Outdated Show resolved Hide resolved
@sethaxen
Copy link
Contributor

Okay, I set CUDAEnzymeCoreExt up as an extension that pre-v1.9 is conditionally loaded using Requires. I'll leave it to others to decide if its better to make EnzymeCore a full dependency pre-v1.9 instead.

@wsmoses
Copy link
Contributor Author

wsmoses commented Sep 22, 2023

@vchuravy @maleadt can this be reviewed?

@wsmoses
Copy link
Contributor Author

wsmoses commented Sep 22, 2023

need to rerun CI though now that Enzyme 11.8 is out which has mutual GPUcompiler compatibility

ext/CUDAEnzymeCoreExt.jl Outdated Show resolved Hide resolved
@maleadt
Copy link
Member

maleadt commented Sep 23, 2023

Most of the functionality in the extension package is Enzyme-specific, so wouldn't it make more sense to make this a CUDA extension package inside Enzyme.jl?

@wsmoses
Copy link
Contributor Author

wsmoses commented Sep 24, 2023

I think it makes more sense here since it just uses the lightweight interface-only package EnzymeCore, whereas it does need to know about some CUDA internals (and a lot more for reversemode).

@maleadt
Copy link
Member

maleadt commented Sep 25, 2023

The Manifest wasn't updated, and there were typo's in the extension package that prevented it from parsing... But even with those fixed, CI doesn't pass:

Error During Test at /home/tim/Julia/pkg/CUDA/test/setup.jl:59
  Got exception outside of a @test
  LoadError: MethodError: no method matching batch_size(::Type{Duplicated{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}})

  Closest candidates are:
    batch_size(::BatchDuplicatedNoNeed{T, N}) where {T, N}
     @ EnzymeCore ~/Julia/depot/packages/EnzymeCore/7SvAO/src/EnzymeCore.jl:153
    batch_size(::BatchDuplicatedFunc{T, N}) where {T, N}
     @ EnzymeCore ~/Julia/depot/packages/EnzymeCore/7SvAO/src/EnzymeCore.jl:152
    batch_size(::BatchDuplicated{T, N}) where {T, N}
     @ EnzymeCore ~/Julia/depot/packages/EnzymeCore/7SvAO/src/EnzymeCore.jl:151

Once this is fixed, it should be good to go, as I'm told other Enzyme rules live in their respective packages as well (instead of adding extension packages to Enzyme itself).

@maleadt maleadt marked this pull request as draft September 25, 2023 12:58
Copy link
Member

@maleadt maleadt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of nits, and tests needs to be fixed.

ext/CUDAEnzymeCoreExt.jl Outdated Show resolved Hide resolved
ext/CUDAEnzymeCoreExt.jl Outdated Show resolved Hide resolved
@wsmoses wsmoses marked this pull request as ready for review October 19, 2023 21:56
@wsmoses
Copy link
Contributor Author

wsmoses commented Oct 19, 2023

Thanks to @vchuravy this is now much simplified, and tests pass locally.

@michel2323
Copy link

michel2323 commented Jan 9, 2024

@maleadt Used the latest Enzyme#main and Enzyme.jl#main and I can't reproduce the error on two systems (test env, REPL, multiple calls...), neither before or after rebasing this branch. I can't push to @wsmoses's branch. It might be worthwhile just rerunning the CI with the latest Enzyme.

@maleadt
Copy link
Member

maleadt commented Jan 9, 2024

OK. I rebased the PR and re-triggered CI.

@maleadt
Copy link
Member

maleadt commented Jan 9, 2024

Fails again during Enzyme tests.

@michel2323
Copy link

This is a different error than last time and I can reproduce this on Enzyme.jl#0.11.12 with Enzyme#main. Using Enzyme.jl#main (0.11.13) the test passes. @wsmoses Can you bump Enzyme.jl?

@CarloLucibello
Copy link
Contributor

@wsmoses is something blocking this?

@wsmoses
Copy link
Contributor Author

wsmoses commented Mar 22, 2024

Tim saw that it (at least several Enzyme versions ago) had a nondetermism in CI, which I have been unable to reproduce.

I've also not had time to investigate it more.

But probalby rebasing/rerunning/giving it some love would be warranted if you have cycles.

@maleadt
Copy link
Member

maleadt commented Mar 27, 2024

Fails to compile now, because of embedding a CuArray in a kernel. Missing Adapt rule for Duplicated?

 LoadError: GPU compilation of MethodInstance for CUDA.CUDAEnzymeCoreExt.metaf(::typeof(square_kernel!), ::Duplicated{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}) failed
  KernelError: passing and using non-bitstype argument
  Argument 3 to your kernel function is of type Duplicated{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, which is not isbits:
    .val is of type CuArray{Float32, 1, CUDA.Mem.DeviceBuffer} which is not isbits.
      .data is of type GPUArrays.DataRef{CUDA.Mem.DeviceBuffer} which is not isbits.
        .rc is of type GPUArrays.RefCounted{CUDA.Mem.DeviceBuffer} which is not isbits.
          .obj is of type CUDA.Mem.DeviceBuffer which is not isbits.
            .ctx is of type CuContext which is not isbits.
              .unsafe is of type Any which is not isbits.
          .finalizer is of type Any which is not isbits.
          .count is of type Base.Threads.Atomic{Int64} which is not isbits.
    .dval is of type CuArray{Float32, 1, CUDA.Mem.DeviceBuffer} which is not isbits.
      .data is of type GPUArrays.DataRef{CUDA.Mem.DeviceBuffer} which is not isbits.
        .rc is of type GPUArrays.RefCounted{CUDA.Mem.DeviceBuffer} which is not isbits.
          .obj is of type CUDA.Mem.DeviceBuffer which is not isbits.
            .ctx is of type CuContext which is not isbits.
              .unsafe is of type Any which is not isbits.
          .finalizer is of type Any which is not isbits.
          .count is of type Base.Threads.Atomic{Int64} which is not isbits.

@vchuravy
Copy link
Member

vchuravy commented Mar 27, 2024

@vchuravy
Copy link
Member

We should also standardize the extension name #2281

@wsmoses
Copy link
Contributor Author

wsmoses commented Mar 27, 2024

Disabling <=1.8 CUDA+Enzyme test for now.

Does 1.9+ look good o you?

@maleadt
Copy link
Member

maleadt commented Mar 29, 2024

Disabling the Enzyme test on 1.8 is fine, I guess. There's other Enzyme-related test failures though, in some of the special tests we only run if the essential ones pass.

@wsmoses
Copy link
Contributor Author

wsmoses commented Apr 20, 2024

@maleadt the test failures here seem unrelated

@wsmoses wsmoses force-pushed the master branch 2 times, most recently from dc7ba69 to dd64c38 Compare April 28, 2024 01:44
Co-authored-by: Seth Axen <[email protected]>
Co-authored-by: "William S. Moses" <[email protected]>
@maleadt
Copy link
Member

maleadt commented Apr 28, 2024

CI failure seems related again.

@wsmoses
Copy link
Contributor Author

wsmoses commented Apr 28, 2024

Yeah rebase error, lets see if this is happy now

@wsmoses
Copy link
Contributor Author

wsmoses commented Apr 28, 2024

@maleadt tests now look reasonable to me?

@maleadt
Copy link
Member

maleadt commented Apr 29, 2024

There's still an Enzyme failure, now in the unified memory tests:

  LoadError: Enzyme execution failed.
  Mismatched activity for:   store volatile {} addrspace(10)* addrspacecast ({}* inttoptr (i64 139901706500816 to {}*) to {} addrspace(10)*), {} addrspace(10)** %phic, align 8, !dbg !351, !noalias !297 const val: {} addrspace(10)* addrspacecast ({}* inttoptr (i64 139901706500816 to {}*) to {} addrspace(10)*)
   value=ReentrantLock(nothing, 0x00000000, 0x00, Base.GenericCondition{Base.Threads.SpinLock}(Base.IntrusiveLinkedList{Task}(nothing, nothing), Base.Threads.SpinLock(0)), (2, 139974522777680, 139974522778384)) of type ReentrantLock
  You may be using a constant variable as temporary storage for active memory (https://enzyme.mit.edu/julia/stable/faq/#Activity-of-temporary-storage). If not, please open an issue, and either rewrite this variable to not be conditionally active or use Enzyme.API.runtimeActivity!(true) as a workaround for now

There's also a lot of log spam that I would like to see fixed:

      From worker 2:	┌ Warning: Type does not have a definite number of fields
      From worker 2:	│   T = Tuple{Vararg{Union{UInt64, String}}}
      From worker 2:	└ @ Enzyme ~/.cache/julia-buildkite-plugin/depots/3cc01fab-3357-4a7a-9294-cde2d3115a97/packages/GPUCompiler/kqxyC/src/utils.jl:59
      From worker 2:	┌ Warning: Type does not have a definite number of fields
      From worker 2:	│   T = Tuple{Vararg{Union{UInt64, String}}}
      From worker 2:	└ @ Enzyme ~/.cache/julia-buildkite-plugin/depots/3cc01fab-3357-4a7a-9294-cde2d3115a97/packages/GPUCompiler/kqxyC/src/utils.jl:59

and

      From worker 2:	WARNING: Wrapping `Vararg` directly in UnionAll is deprecated (wrap the tuple instead).
      From worker 2:	You may need to write `f(x::Vararg{T})` rather than `f(x::Vararg{<:T})` or `f(x::Vararg{T}) where T` instead of `f(x::Vararg{T} where T)`.
      From worker 2:	WARNING: Wrapping `Vararg` directly in UnionAll is deprecated (wrap the tuple instead).
      From worker 2:	You may need to write `f(x::Vararg{T})` rather than `f(x::Vararg{<:T})` or `f(x::Vararg{T}) where T` instead of `f(x::Vararg{T} where T)`.
      From worker 2:	WARNING: Wrapping `Vararg` directly in UnionAll is deprecated (wrap the tuple instead).
      From worker 2:	You may need to write `f(x::Vararg{T})` rather than `f(x::Vararg{<:T})` or `f(x::Vararg{T}) where T` instead of `f(x::Vararg{T} where T)`.

These are all generated by the Enzyme tests:

libraries/enzyme                           (2) |         failed at 2024-04-28T18:38:02.863

@wsmoses
Copy link
Contributor Author

wsmoses commented May 7, 2024

@maleadt the unified memory tests now pass.

LGTM?

@maleadt
Copy link
Member

maleadt commented May 7, 2024

Yep, thanks!

@maleadt maleadt merged commit c423ce3 into JuliaGPU:master May 7, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
upstream Somebody else's problem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants