You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As things are, everything that depends on ClimaComms inherits its dependencies: MPI and CUDA. These are not lightweight, especially CUDA. I think that extensions might work well here (This is what #4 proposed, and now that's a good way to implement that.
).
The only downside is that downstream dependencies will have to explicitely depend and load MPI/CUDA. This would be quite painful to implement because we rely heavily on automatic selection of the context and the device.
These functions read some env variable, and if not available, they try to guess a resonable device. This is nice because for the most part we don't have to worry about setting anything and the code just runs (but also see #67, sometimes the guesses are incorrect). Unfortunately, I think that guessing the correct device/context is incompatible with using extensions because specific packages have to be loaded (eg CUDA). With the current implementation of PR #75, one would have to load CUDA to use a GPU on a GPU-capable device, otherwise the device would be set to CPU. This is particularly annoying in all CI runs, where we would have to add logic to handle the different cases and import the relevant packages.
Instead, I propose to deprecate the automatic selection of the device and let ClimaComms load the relevant module when asked.
This might look like:
functiondevice()
env_var =get(ENV, "CLIMACOMMS_DEVICE", nothing)
if!isnothing(env_var)
if env_var =="CPU"return Threads.nthreads() >1?CPUMultiThreaded() :CPUSingleThreaded()
elseif env_var =="CPUSingleThreaded"returnCPUSingleThreaded()
elseif env_var =="CPUMultiThreaded"returnCPUMultiThreaded()
elseif env_var =="CUDA"
pkgid = Base.PkgId(Base.UUID("052768ef-5323-5732-b1bb-66c8b64840ba"), "CUDA")
if!haskey(Base.loaded_modules, pkgid)
try
Base.eval(Main, :(using CUDA))
catch err
error("Cannot load CUDA.jl. Make sure it is included in your environment stack.")
endendreturnCUDADevice()
elseerror("Invalid CLIMACOMMS_DEVICE: $env_var")
endendend
This loads CUDA when CLIMACOMMS_DEVICE = CUDA. Now, the only responsability of the user is to ensure that they have CUDA in the environment they are using. The function will fail when the env variable CLIMACOMMS_DEVICE is not set. (This solution is currently loading CUDA into Main, but maybe it's best to load it within ClimaComms)
The text was updated successfully, but these errors were encountered:
Sbozzolo
changed the title
Turn MPI and CUDA to extensions
Turn MPI and CUDA to extensions and standardize how to context and device and picked
Apr 24, 2024
As things are, everything that depends on ClimaComms inherits its dependencies: MPI and CUDA. These are not lightweight, especially CUDA. I think that extensions might work well here (This is what #4 proposed, and now that's a good way to implement that.
).
The only downside is that downstream dependencies will have to explicitely depend and load MPI/CUDA. This would be quite painful to implement because we rely heavily on automatic selection of the context and the device.
The key functions are:
ClimaComms.jl/src/context.jl
Lines 13 to 32 in 13f332d
and
ClimaComms.jl/src/devices.jl
Lines 52 to 74 in 13f332d
These functions read some env variable, and if not available, they try to guess a resonable device. This is nice because for the most part we don't have to worry about setting anything and the code just runs (but also see #67, sometimes the guesses are incorrect). Unfortunately, I think that guessing the correct device/context is incompatible with using extensions because specific packages have to be loaded (eg
CUDA
). With the current implementation of PR #75, one would have to loadCUDA
to use a GPU on a GPU-capable device, otherwise the device would be set to CPU. This is particularly annoying in all CI runs, where we would have to add logic to handle the different cases and import the relevant packages.Instead, I propose to deprecate the automatic selection of the device and let ClimaComms load the relevant module when asked.
This might look like:
This loads CUDA when
CLIMACOMMS_DEVICE = CUDA
. Now, the only responsability of the user is to ensure that they have CUDA in the environment they are using. The function will fail when the env variableCLIMACOMMS_DEVICE
is not set. (This solution is currently loading CUDA into Main, but maybe it's best to load it within ClimaComms)The text was updated successfully, but these errors were encountered: