PyTorch CUDA Semantics
PyTorch organizes access to GPU resources based on the principle of ease of use.
Current Stream
In PyTorch, the current stream
refers to the CUDA stream bound to the current thread. PyTorch provides the following APIs to bind a CUDA stream to the current thread and to obtain the CUDA stream bound to the current thread:
- Python
- C++
torch.cuda.set_stream(stream)
torch.cuda.current_stream(device=None)
c10::cuda::setCurrentCUDAStream(CUDAStream stream);
c10::cuda::getCurrentCUDAStream(DeviceIndex device_index = -1) ;
By default, all threads are bound to the default stream (stream 0).
PyTorch's GPU operations are submitted to the current stream
bound to the current thread. PyTorch tries to make users unaware of this:
- Generally, the
current stream
is the default stream, and tasks submitted to the same stream are executed serially according to submission time. - For operations involving copying GPU data to the CPU or another GPU device, PyTorch defaults to inserting a
current stream
synchronization operation in the operation.
To fully utilize GPU resources in a multi-threaded environment, we can use the following features of PyTorch:
- Bind computational backend threads to independent CUDA streams.
- Synchronize streams when necessary.