Rework how local toolkits are selected. #2058

maleadt · 2023-08-24T13:25:25Z

Copying from JuliaPackaging/Yggdrasil#7246:

Make use of a local toolkit require a separate preference.

That way, the 'version' preference is always set to a valid version number, and not to the "local" string. This makes it so that CUDA.jl will know the version of the local toolkit. The exception is of course still the "none" entry, but that's to work around BinaryBuilder limitations.

To make this feature more powerful, add some libcudart-based detection of the local toolkit's version. This should make setting local = "true" behave similar to using artifacts, i.e., if the driver/runtime are available during precompilation, the version will be auto-detected. Only for environments without CUDA (such as containers, log-in nodes, etc) the user will need to set the version preference to inform the system what version of CUDA is being used.

This PR also adds / changes artifact comparison strategies to always bail out when using a local toolkit. This should prevent downloading useless artifacts when using system libraries.

TLDR: Users of local toolkits now have to specify the version when precompiling in an environment where CUDA is not available (i.e. containers, log-in nodes), by calling CUDA.set_runtime_version!(v"11.8"; local_toolkit=true) or by provisioning a LocalPreferences.toml that contains both the version and local preference. When precompiling on a system where CUDA is available, just setting local = "true" or calling CUDA.set_runtime_version!(local_toolkit=true) is sufficient, and CUDA_Runtime_jll will auto-detect the CUDA version by calling in to libcudart.

I'm doing this so that CUDA.jl knows, during precompilation, which CUDA version will be used. Right now, it only knows that when we're using artifacts, as for local toolkits the version is simply "local" (and we might be precompiling on a system without CUDA, so we can't just check which version we'll be using). Once we're guaranteed to know the CUDA version, we'll be able to do conditional things like versioning the header wrappers, or doing @static conditionals in hot code paths (currently these are runtime checks, memoized for performance).

HPC folks: This is the minor breaking change I mentioned at JuliaCon. It slightly changes the workflow, but shouldn't change anything significantly. It should even improve certain local toolkit aspects, as it won't now download artifacts when local = "true". On the other hand, it will complain when the local CUDA installation is updated and you do not re-compile CUDA.jl (which we cannot easily automate).

codecov · 2023-08-24T16:13:16Z

Codecov Report

Patch coverage: 41.86% and project coverage change: -0.02% ⚠️

Comparison is base (f89e1ab) 71.09% compared to head (569ccf7) 71.08%.
Report is 4 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2058      +/-   ##
==========================================
- Coverage   71.09%   71.08%   -0.02%     
==========================================
  Files         157      157              
  Lines       13911    13916       +5     
==========================================
+ Hits         9890     9892       +2     
- Misses       4021     4024       +3

Files Changed	Coverage Δ
lib/cudadrv/version.jl	`56.66% <0.00%> (-1.96%)`	⬇️
src/CUDA.jl	`100.00% <ø> (ø)`
lib/cudnn/src/cuDNN.jl	`39.74% <42.85%> (ø)`
src/initialization.jl	`54.94% <50.00%> (-0.23%)`	⬇️
lib/custatevec/src/cuStateVec.jl	`54.83% <60.00%> (ø)`
lib/cutensor/src/cuTENSOR.jl	`50.00% <60.00%> (ø)`
lib/cutensornet/src/cuTensorNet.jl	`59.01% <60.00%> (ø)`
src/utilities.jl	`78.30% <100.00%> (ø)`

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

simonbyrne · 2023-09-13T19:28:28Z

@maleadt what version will this change land in?

maleadt · 2023-09-14T05:50:06Z

@maleadt what version will this change land in?

5.0, which I plan to release early next week (with an accompanying blog post). For all intents and purposes, the current state of the master branch is what will be in the release, so now would be a great time for some last-minute testing.

simonbyrne · 2023-09-14T18:18:35Z

is there a way we could do this so the preferences would be backward compatible?

maleadt · 2023-09-14T18:31:04Z

Not without some effort; is why I asked about it beforehand. I guess we could make it so that version = "local" would still be accepted if there's also an actual_version transitional preference or something. Or what do you suggest?

simonbyrne · 2023-09-14T18:33:27Z

yes? Basically I have some overrides we apply system wide, and it would be nice if they applied to all versions of CUDA.jl.

maleadt · 2023-09-15T06:41:37Z

JuliaPackaging/Yggdrasil#7338

simonbyrne · 2023-10-03T17:48:57Z

FYI, I don't see a warning if I don't set the version properly. e.g. the following Project.toml doesn't trigger anything

[deps]
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
CUDA_Runtime_Discovery = "1af6417a-86b4-443c-805f-a4643ffb695f"
CUDA_Runtime_jll = "76a88914-d11a-5bdc-97e0-2f5a05c973a2"

[preferences.CUDA_Runtime_jll]
version = "local"

maleadt · 2023-10-23T09:00:14Z

Not sure about that, but it works if I add version = "local" to the LocalPreferences.toml in a CUDA.jl check-out, so the core mechanism seems to work.

maleadt added the installation CUDA is easy to install, right? label Aug 24, 2023

Rework how local toolkits are selected.

569ccf7

maleadt force-pushed the tb/cuda_local_revamp branch from 90fe32a to 569ccf7 Compare August 24, 2023 14:49

maleadt merged commit faff26c into master Aug 25, 2023

maleadt deleted the tb/cuda_local_revamp branch August 25, 2023 06:18

bjoe2k4 mentioned this pull request Feb 23, 2024

Fix cuTensor, cuTensorNet and cuStateVec when using local Toolkit #2274

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework how local toolkits are selected. #2058

Rework how local toolkits are selected. #2058

maleadt commented Aug 24, 2023

codecov bot commented Aug 24, 2023 •

edited

Loading

simonbyrne commented Sep 13, 2023

maleadt commented Sep 14, 2023

simonbyrne commented Sep 14, 2023

maleadt commented Sep 14, 2023

simonbyrne commented Sep 14, 2023

maleadt commented Sep 15, 2023

simonbyrne commented Oct 3, 2023

maleadt commented Oct 23, 2023

Rework how local toolkits are selected. #2058

Rework how local toolkits are selected. #2058

Conversation

maleadt commented Aug 24, 2023

codecov bot commented Aug 24, 2023 • edited Loading

Codecov Report

simonbyrne commented Sep 13, 2023

maleadt commented Sep 14, 2023

simonbyrne commented Sep 14, 2023

maleadt commented Sep 14, 2023

simonbyrne commented Sep 14, 2023

maleadt commented Sep 15, 2023

simonbyrne commented Oct 3, 2023

maleadt commented Oct 23, 2023

codecov bot commented Aug 24, 2023 •

edited

Loading