Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework how local toolkits are selected. #2058

Merged
merged 1 commit into from
Aug 25, 2023
Merged

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Aug 24, 2023

Copying from JuliaPackaging/Yggdrasil#7246:

Make use of a local toolkit require a separate preference.

That way, the 'version' preference is always set to a valid version number, and not to the "local" string. This makes it so that CUDA.jl will know the version of the local toolkit. The exception is of course still the "none" entry, but that's to work around BinaryBuilder limitations.

To make this feature more powerful, add some libcudart-based detection of the local toolkit's version. This should make setting local = "true" behave similar to using artifacts, i.e., if the driver/runtime are available during precompilation, the version will be auto-detected. Only for environments without CUDA (such as containers, log-in nodes, etc) the user will need to set the version preference to inform the system what version of CUDA is being used.

This PR also adds / changes artifact comparison strategies to always bail out when using a local toolkit. This should prevent downloading useless artifacts when using system libraries.

TLDR: Users of local toolkits now have to specify the version when precompiling in an environment where CUDA is not available (i.e. containers, log-in nodes), by calling CUDA.set_runtime_version!(v"11.8"; local_toolkit=true) or by provisioning a LocalPreferences.toml that contains both the version and local preference. When precompiling on a system where CUDA is available, just setting local = "true" or calling CUDA.set_runtime_version!(local_toolkit=true) is sufficient, and CUDA_Runtime_jll will auto-detect the CUDA version by calling in to libcudart.

I'm doing this so that CUDA.jl knows, during precompilation, which CUDA version will be used. Right now, it only knows that when we're using artifacts, as for local toolkits the version is simply "local" (and we might be precompiling on a system without CUDA, so we can't just check which version we'll be using). Once we're guaranteed to know the CUDA version, we'll be able to do conditional things like versioning the header wrappers, or doing @static conditionals in hot code paths (currently these are runtime checks, memoized for performance).

HPC folks: This is the minor breaking change I mentioned at JuliaCon. It slightly changes the workflow, but shouldn't change anything significantly. It should even improve certain local toolkit aspects, as it won't now download artifacts when local = "true". On the other hand, it will complain when the local CUDA installation is updated and you do not re-compile CUDA.jl (which we cannot easily automate).

@maleadt maleadt added the installation CUDA is easy to install, right? label Aug 24, 2023
@maleadt maleadt force-pushed the tb/cuda_local_revamp branch from 90fe32a to 569ccf7 Compare August 24, 2023 14:49
@codecov
Copy link

codecov bot commented Aug 24, 2023

Codecov Report

Patch coverage: 41.86% and project coverage change: -0.02% ⚠️

Comparison is base (f89e1ab) 71.09% compared to head (569ccf7) 71.08%.
Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2058      +/-   ##
==========================================
- Coverage   71.09%   71.08%   -0.02%     
==========================================
  Files         157      157              
  Lines       13911    13916       +5     
==========================================
+ Hits         9890     9892       +2     
- Misses       4021     4024       +3     
Files Changed Coverage Δ
lib/cudadrv/version.jl 56.66% <0.00%> (-1.96%) ⬇️
src/CUDA.jl 100.00% <ø> (ø)
lib/cudnn/src/cuDNN.jl 39.74% <42.85%> (ø)
src/initialization.jl 54.94% <50.00%> (-0.23%) ⬇️
lib/custatevec/src/cuStateVec.jl 54.83% <60.00%> (ø)
lib/cutensor/src/cuTENSOR.jl 50.00% <60.00%> (ø)
lib/cutensornet/src/cuTensorNet.jl 59.01% <60.00%> (ø)
src/utilities.jl 78.30% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@maleadt maleadt merged commit faff26c into master Aug 25, 2023
@maleadt maleadt deleted the tb/cuda_local_revamp branch August 25, 2023 06:18
@simonbyrne
Copy link
Contributor

@maleadt what version will this change land in?

@maleadt
Copy link
Member Author

maleadt commented Sep 14, 2023

@maleadt what version will this change land in?

5.0, which I plan to release early next week (with an accompanying blog post). For all intents and purposes, the current state of the master branch is what will be in the release, so now would be a great time for some last-minute testing.

@simonbyrne
Copy link
Contributor

is there a way we could do this so the preferences would be backward compatible?

@maleadt
Copy link
Member Author

maleadt commented Sep 14, 2023

Not without some effort; is why I asked about it beforehand. I guess we could make it so that version = "local" would still be accepted if there's also an actual_version transitional preference or something. Or what do you suggest?

@simonbyrne
Copy link
Contributor

yes? Basically I have some overrides we apply system wide, and it would be nice if they applied to all versions of CUDA.jl.

@maleadt
Copy link
Member Author

maleadt commented Sep 15, 2023

@simonbyrne
Copy link
Contributor

FYI, I don't see a warning if I don't set the version properly. e.g. the following Project.toml doesn't trigger anything

[deps]
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
CUDA_Runtime_Discovery = "1af6417a-86b4-443c-805f-a4643ffb695f"
CUDA_Runtime_jll = "76a88914-d11a-5bdc-97e0-2f5a05c973a2"

[preferences.CUDA_Runtime_jll]
version = "local"

@maleadt
Copy link
Member Author

maleadt commented Oct 23, 2023

Not sure about that, but it works if I add version = "local" to the LocalPreferences.toml in a CUDA.jl check-out, so the core mechanism seems to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
installation CUDA is easy to install, right?
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants