CUDA.jl 1.3 - Multi-device programming
Tim Besard
Today we're releasing CUDA.jl 1.3, with several new features. The most prominent change is support for multiple GPUs within a single process.
Multi-GPU programming
With CUDA.jl 1.3, you can finally use multiple CUDA GPUs within a single process. To switch devices you can call device!
, query the current device with device()
, or reset it using device_reset!()
:
julia> collect(devices())
9-element Array{CuDevice,1}:
CuDevice(0): Tesla V100-PCIE-32GB
CuDevice(1): Tesla V100-PCIE-32GB
CuDevice(2): Tesla V100-PCIE-32GB
CuDevice(3): Tesla V100-PCIE-32GB
CuDevice(4): Tesla V100-PCIE-16GB
CuDevice(5): Tesla P100-PCIE-16GB
CuDevice(6): Tesla P100-PCIE-16GB
CuDevice(7): GeForce GTX 1080 Ti
CuDevice(8): GeForce GTX 1080 Ti
julia> device!(5)
julia> device()
CuDevice(5): Tesla P100-PCIE-16GB
Let's define a kernel to show this really works:
julia> function kernel()
dev = Ref{Cint}()
CUDA.cudaGetDevice(dev)
@cuprintln("Running on device $(dev[])")
return
end
julia> @cuda kernel()
Running on device 5
julia> device!(0)
julia> device()
CuDevice(0): Tesla V100-PCIE-32GB
julia> @cuda kernel()
Running on device 0
Memory allocations, like CuArray
s, are implicitly bound to the device they were allocated on. That means you should take care to only use an array when the owning device is active, or you will run into errors:
julia> device()
CuDevice(0): Tesla V100-PCIE-32GB
julia> a = CUDA.rand(1)
1-element CuArray{Float32,1}:
0.6322775
julia> device!(1)
julia> a
ERROR: CUDA error: an illegal memory access was encountered
Future improvements might make the array type device-aware.
Multitasking and multithreading
Dovetailing with the support for multiple GPUs, is the ability to use these GPUs on separate Julia tasks and threads:
julia> device!(0)
julia> @sync begin
@async begin
device!(1)
println("Working with $(device()) on $(current_task())")
yield()
println("Back to device $(device()) on $(current_task())")
end
@async begin
device!(2)
println("Working with $(device()) on $(current_task())")
end
end
Working with CuDevice(1) on Task @0x00007fc9e6a48010
Working with CuDevice(2) on Task @0x00007fc9e6a484f0
Back to device CuDevice(1) on Task @0x00007fc9e6a48010
julia> device()
CuDevice(0): Tesla V100-PCIE-32GB
Each task has its own local GPU state, such as the device it was bound to, handles to libraries like CUBLAS or CUDNN (which means that each task can configure libraries independently), etc.
Minor features
CUDA.jl 1.3 also features some minor changes:
Reinstated compatibility with Julia 1.3
Support for CUDA 11.0 Update 1
Support for CUDNN 8.0.2
Known issues
Several operations on sparse arrays have been broken since CUDA.jl 1.2, due to the deprecations that were part of CUDA 11. The next version of CUDA.jl will drop support for CUDA 10.0 or older, which will make it possible to use new cuSPARSE APIs and add back missing functionality.