đī¸ CUDA: Streams and Concurrency
CUDA provides a consistent abstraction to control concurrent access, allowing users to maximize and fully utilize the resource capabilities of a single GPU device.
đī¸ PyTorch CUDA Semantics
PyTorch organizes access to GPU resources based on the principle of ease of use.
đī¸ Threads, Processes, and the GIL
When deploying deep learning services with multiple threads in a Python environment, a major challenge is the GIL lock. There have been many attempts to solve this problem.
đī¸ Performance indicators for services
When evaluating the performance of a service, there are several key indicators to consider. These indicators can help us understand the performance of the service in terms of latency, throughput, error rate, and other aspects. Here are some commonly used key performance indicators: