Skip to main content

Installation

Dependencies

The basic environment is as follows:

DependencyVersionRemarks
cuda>= 11.0- The CUDA version must be consistent with the version that PyTorch depends on (for the convenience of torch.utils.cpp_extension to compile code on the fly).
- Compatibility testing with CUDA 10.2(no support for c++17) is no longer performed.
pytorch>= 1.10.2(cuda11)- Compatibility testing is no longer performed for c++14/cuda-10.2/pytorch==1.10.2, but you may still be able to run it with simple modifications.
opencv4.xAt least the core, imgproc, imgcodecs, and highgui modules are included.
tensorrt>= 8.0
<= 10.3
- There is a memory leak in TensorRT 7.0 with dynamic inputs.
note

All dependencies mentioned above come from a specific default backend. The construction of C++ core does not rely on any of the above dependencies.

Using NGC base image

The easiest way is to choose NGC mirror for source code compilation (official mirror may still be able to run low version drivers through Forward Compatibility or Minor Version Compatibility).

  • Minimum support nvcr.io/nvidia/pytorch:21.11-py3
  • Maximum support nvcr.io/nvidia/pytorch:23.08-py3
  • Latest test version: nvcr.io/nvidia/pytorch:22.12-py3

If you encounter any compilation or runtime issues inside the Docker container, please submit an issue.

First, clone the code:

git clone https://github.com/torchpipe/torchpipe.git
# git clone -b main https://github.com/torchpipe/torchpipe.git
cd torchpipe/ && git submodule update --init --recursive

Then start the container and if your machine supports a higher version of the image, you can use the updated version of the Pytorch image.

img_name=nvcr.io/nvidia/pytorch:23.05-py3 #  for tensort8.6.1, LayerNorm
# img_name=nvcr.io/nvidia/pytorch:22.12-py3 # For driver version lower than 510
docker run --rm --gpus=all --ipc=host --network=host -v `pwd`:/workspace --shm-size 1G --ulimit memlock=-1 --ulimit stack=67108864 --privileged=true -w/workspace -it $img_name /bin/bash
note

If you are using a transformer-like model, it is strongly recommended to use TensorRT >= 8.6.1 (nvcr.io/nvidia/pytorch:23.05-py3) for supporting opset 17 for LayerNormalization and opset 18 GroupNormalization, as well as deeper support for such models. However, the NGC image has certain requirements for the GPU driver version as mentioned here:

*Release 23.05 is based on CUDA 12.1.1, which requires NVIDIA Driver release 530 or later. However, if you are running on a data center GPU (for example, T4 or any other data center GPU), you can use NVIDIA driver release 450.51 (or later R450), 470.57 (or later R470), 510.47 (or later R510), 515.65 (or later R515), 525.85 (or later R525), or 530.30 (or later R530)*

Next, you can compile the source code:

python setup.py install

Then you can run all the example code:

cd examples/resnet18 && python resnet18.py
# or
cd examples/yolox && python yolox.py
Parallel inference of resnet18.

Obtain the appropriate model file (currently supporting ONNX, TRT engine, etc.).

import torchvision.models as models
resnet18 = models.resnet18(pretrained=True).eval().cuda()

import tempfile, os, torch
model_path = os.path.join(tempfile.gettempdir(), "./resnet18.onnx")
resnet18 = models.resnet18(pretrained=True).eval().cuda()
data_bchw = torch.rand((1, 3, 224, 224)).cuda()
print("export: ", model_path)
torch.onnx.export(resnet18, data_bchw, model_path,
opset_version=17,
do_constant_folding=True,
input_names=["in"], output_names=["out"],dynamic_axes={"in": {0: "x"},"out": {0: "x"}})

# os.system(f"onnxsim {model_path} {model_path}")

Now you can make concurrent calls to a single model.

import torch, torchpipe
model = torchpipe.pipe({'model': model_path,
'backend': "Sequential[CvtColorTensor,TensorrtTensor,SyncTensor]", # Back-end engine, as explained in the "Overview" section.
'instance_num': 2, 'batching_timeout': '5', # Number of instances and timeout duration.
'max': 4, # Maximum value for model optimization scope, can also be '4x3x224x224'.
'mean': '123.675, 116.28, 103.53',#255*"0.485, 0.456, 0.406",
'std': '58.395, 57.120, 57.375', # Merge into TensorRT network.
'color': 'rgb'}) # cvtColorTensor backend parameter: target color space order
data = torch.zeros((1, 3, 224, 224)).cuda() # or torch.from_numpy(...)
input = {"data": data, 'color': 'bgr'}
model(input) # Concurrency can be utilized
# Use "result" as the data output identifier, although other key values can be customized as well.
print(input["result"].shape) # If the inference fails, the "result" key will not exist, even if it was present in the input.

For more examples, see Showcase.

Customizing Dockerfile

Refer to the example Dockerfile.

 
docker build --network=host -f ./docker/Dockerfile -t torchpipe thirdparty/

docker run --rm --network=host --gpus=all --ulimit memlock=-1 --ulimit stack=67108864 --privileged=true -v `pwd`:/workspace -it torchpipe:latest /bin/bash

cd /workspace/ && python setup.py install

cd examples/resnet18 && python resnet18.py


Base images compiled in this way have smaller sizes than NGC PyTorch images. Please note that _GLIBCXX_USE_CXX11_ABI==0.

Description

Compilation Options

OptionDefaultDescription
DEBUG0Whether to compile in debug mode
WITH_CVCUDA0Whether to compile the C++ backend that depends on CVCUDA(>=0.4.0)
BUILD_PPLCV0Whether to compile the C++ backend that depends on ppl.cv(>=0.3.3b2)
IPIPE_KEY(optinal)Nonestr(length>=8). If using encryption and decryption functions, this key needs to be set.

ABI

The compiler used for both compiling the code and the dependent libraries needs to be ABI-compatible with the compiler used by the installed PyTorch. In PyTorch, you can check the compatibility as follows:

python -c "import torch; print(torch._C._GLIBCXX_USE_CXX11_ABI)"
Installation Source_GLIBCXX_USE_CXX11_ABI
PyPI-PyTorch0
NGC-PyTorch1
libtorch1

Compiling with integrated third-party libraries (using Ubuntu-OpenCV as an example)

By default, installing OpenCV using apt-get install libopencv-dev satisfies _GLIBCXX_USE_CXX11_ABI=1, which is not compatible with PyTorch from PyPI. However, you can compile PyTorch-compatible OpenCV by adding the -D_GLIBCXX_USE_CXX11_ABI=0 configuration to CMake.

Compilation Command
wget https://codeload.github.com/opencv/opencv/zip/refs/tags/4.5.4 -O opencv-4.5.4.zip

unzip opencv-4.5.4.zip && pip3 install cmake

cd opencv-4.5.4/ && \
sed -i '1a add_definitions(-D_GLIBCXX_USE_CXX11_ABI=0)' CMakeLists.txt && \
mkdir build && cd build && \
cmake -D CMAKE_BUILD_TYPE=Release \
-D BUILD_WITH_DEBUG_INFO=OFF \
-D CMAKE_INSTALL_PREFIX=/usr/local/ \
-D INSTALL_C_EXAMPLES=OFF \
-D INSTALL_PYTHON_EXAMPLES=OFF \
-DENABLE_NEON=OFF \
-D WITH_TBB=ON \
-DBUILD_TBB=ON \
-DBUILD_WEBP=OFF \
-D BUILD_ITT=OFF -D WITH_IPP=ON \
-D WITH_V4L=OFF \
-D WITH_QT=OFF \
-D WITH_OPENGL=OFF \
-D BUILD_opencv_dnn=OFF \
-DBUILD_opencv_java=OFF \
-DBUILD_opencv_python2=OFF \
-DBUILD_opencv_python3=ON \
-D BUILD_NEW_PYTHON_SUPPORT=ON \
-D BUILD_PYTHON_SUPPORT=ON \
-D PYTHON_DEFAULT_EXECUTABLE=/usr/bin/python3 \
-DBUILD_opencv_java_bindings_generator=OFF \
-DBUILD_opencv_python_bindings_generator=ON \
-D BUILD_EXAMPLES=OFF \
-D WITH_OPENEXR=OFF \
-DWITH_JPEG=ON \
-DBUILD_JPEG=ON\
-D BUILD_JPEG_TURBO_DISABLE=OFF \
-D BUILD_DOCS=OFF \
-D BUILD_PERF_TESTS=OFF \
-D BUILD_TESTS=OFF \
-D BUILD_opencv_apps=OFF \
-D BUILD_opencv_calib3d=OFF \
-D BUILD_opencv_contrib=OFF \
-D BUILD_opencv_features2d=OFF \
-D BUILD_opencv_flann=OFF \
-DBUILD_opencv_gapi=OFF \
-D WITH_CUDA=OFF \
-D WITH_CUDNN=OFF \
-D OPENCV_DNN_CUDA=OFF \
-D ENABLE_FAST_MATH=1 \
-D WITH_CUBLAS=0 \
-D BUILD_opencv_gpu=OFF \
-D BUILD_opencv_ml=OFF \
-D BUILD_opencv_nonfree=OFF \
-D BUILD_opencv_objdetect=OFF \
-D BUILD_opencv_photo=OFF \
-D BUILD_opencv_stitching=OFF \
-D BUILD_opencv_superres=OFF \
-D BUILD_opencv_ts=OFF \
-D BUILD_opencv_video=OFF \
-D BUILD_videoio_plugins=OFF \
-D BUILD_opencv_videostab=OFF \
-DBUILD_EXAMPLES=OFF \
-DBUILD_opencv_calib3d=OFF \
-DBUILD_opencv_features2d=OFF\
-DBUILD_opencv_flann=OFF\
-DBUILD_opencv_ml=OFF\
-DBUILD_opencv_videoio=OFF\
.. && make -j4 && make install