Skip to main content

PyTorch CUDA Semantics

PyTorch organizes access to GPU resources based on the principle of ease of use.

Current Stream

In PyTorch, the current stream refers to the CUDA stream bound to the current thread. PyTorch provides the following APIs to bind a CUDA stream to the current thread and to obtain the CUDA stream bound to the current thread:

torch.cuda.set_stream(stream)
torch.cuda.current_stream(device=None)

By default, all threads are bound to the default stream (stream 0).

PyTorch's GPU operations are submitted to the current stream bound to the current thread. PyTorch tries to make users unaware of this:

  • Generally, the current stream is the default stream, and tasks submitted to the same stream are executed serially according to submission time.
  • For operations involving copying GPU data to the CPU or another GPU device, PyTorch defaults to inserting a current stream synchronization operation in the operation.

To fully utilize GPU resources in a multi-threaded environment, we can use the following features of PyTorch:

  • Bind computational backend threads to independent CUDA streams.
  • Synchronize streams when necessary.

Reference