How to Convert Torch Models to ONNX
In all of our known practices, TorchPipe can completely replace torch2trt
through static ONNX composition, dynamic ONNX, pre-generated TensorRT models, and other methods.
Torch to ONNX Conversion
The framework prioritizes dynamic batch
or static batch
with batchsize==1
. In reality, some models cannot be converted to dynamic scale or are prone to errors. We also support loading multiple models with different static batch sizes at the same time to simulate dynamic scale. The following instructions mainly apply to exporting dynamic batch size models.
- The following operations make dynamic batch size unavailable:
x.view(int(x.size(0)), -1)
. Check if the model file has hardcoded the batch dimension, such asx.view(int(x.size(0)), -1, 1, 1)
,x.reshape(int(x.size(0)), -1, 1, 1)
, etc., which may cause problems with dynamic batch size after converting to ONNX. Note that in Transformer-like networks, the batch dimension is not necessarily in the 0th dimension. - When the batch dimension is specified as dynamic size, low-version TensorRT has weaker processing capabilities and more redundant operators. For example, for
x.view(x.size(0), -1)
, Gather and other operators will be introduced in ONNX to calculate the first dimension of x. It can be modified tox = x.view(-1, int(x.size(1)*x.size(2)*x.size(3)))
orx = torch.flatten(x, 1)
. This is not necessary. - For some models (TensorRT 8.5.1, LSTM, and Transformer), when the batch dimension and non-batch dimension are both dynamic, more resources may be consumed:
- For LayerNorm layers and Transformer-like networks with dynamic batch size, opset>=17 and TensorRT>=8.6.1 are recommended.
# When both batch and non-batch dimensions are dynamic, it takes 9ms (inference input size is optShapes=input:1x1000x80,mask:1x1x1000):
/opt/tensorrt/bin/trtexec --onnx=test_fp32.onnx --shapes=input:1x1000x80,mask:1x1x1000 --workspace=64000 \
--minShapes=input:1x20x80,mask:1x1x20 \
--optShapes=input:1x1000x80,mask:1x1x1000 \
--maxShapes=input:4x2000x80,mask:4x1x2000
# When batchsize==1 is fixed, it only takes 4.6ms:
/opt/tensorrt/bin/trtexec --onnx=test_fp32.onnx --shapes=input:1x1000x80,mask:1x1x1000 --workspace=64000 \
--minShapes=input:1x20x80,mask:1x1x20 \
--optShapes=input:1x1000x80,mask:1x1x1000 \
--maxShapes=input:1x2000x80,mask:1x1x2000
At this point, it is recommended to discretize only one dimension.
- Whenever possible, keep the batch dimension in the 0th dimension with a length of the default state (i.e., -1) to remove redundant operators.
- Use onnx-simplify for optimization.
- Smaller optimization ranges usually mean faster speeds and less resource consumption.
After modifying the network, you can use the following code to convert the PyTorch model to an ONNX model:
x = torch.randn(1,3,224,224).cuda()
out_size = 1
in_out={"input":{0:"batch_size"}}
for i in range(out_size):
in_out[f"output_{i}"] = {0:"batch_size"}
torch_model.cuda().eval()
torch.onnx.export(torch_model,
x,
onnx_save_path,
opset_version=17,
do_constant_folding=True,
input_names=["input"], # input name
output_names=[f"output_{i}" for i in range(out_size)], # output names
dynamic_axes=in_out)
import onnx
from onnxsim import onnx_simplifier
onnx_model = onnx.load(onnx_save_path)
onnx_model = onnx.shape_inference.infer_shapes(onnx_model)
model_simp, check = onnx_simplifier.simplify(onnx_model, check_n = 0)
onnx.save(model_simp, onnx_save_path)
`torchpipe.utils.models.onnx_export`(effective from 0.3.2b1):
- This tool can convert PyTorch models to ONNX models and save them locally. It only supports single input.
- It supports dynamic batch and comes with onnx-simplify optimization.
def onnx_export(model: Union[torch.nn.Module, torch.jit.ScriptModule, torch.jit.ScriptFunction], onnx_path, input = None, opset = 17):
- model - PyTorch model.
- onnx_path - Path to save the ONNX model.
- input - Model input. Defaults to torch.randn(1,3,224,224) if not set.
- opset - ONNX opset.
Example Code
import os, tempfile
from torchvision import models
import torch
import torchpipe
## export onnx
m = models.resnet50(weights=None).eval()
onnx_path = os.path.join(tempfile.gettempdir(), f"resnet50.onnx")
torchpipe.utils.models.onnx_export(m, onnx_path, torch.randn(1, 3, 224, 224), opset=17)
Reasons for Conversion Failure
When converting from torch to ONNX, it is common to encounter conversion failures. Here are some methods that can be used:
- Keep dynamic dimensions dynamic. For example, for YOLOX:
x = x.view(int(x.size(0)), -1, 1, 1)
# Change to:
x = x.flatten(1).unsqueeze(2).unsqueeze(2)
x = x.view(int(x.size(0)), -1)
# Change to:
x = x.view(-1, int(x.size(1)*x.size(2)*x.size(3)))
- Change boolean values to float:
tgt_padding_mask = (tgt_in == self.eos_id)
# Change to:
tgt_padding_mask = (tgt_in == self.eos_id).float()
- Simplify using onnx-simplify.
- Try different versions due to version issues:
- Use the latest version as much as possible, such as onnx opset >= 14 and tensorrt >= 8.2.
- For tensorrt7, it is recommended to use onnx version 1.9.0 and onnx opset = 11.
- Try using trtexec for model conversion:
/opt/tensorrt/bin/trtexec --onnx=folded.onnx
--workspace=64000 \
--minShapes=input:1x20x80,mask:1x1x20 \
--optShapes=input:1x1000x80,mask:1x1x1000 \
--maxShapes=input:1x2000x80,mask:1x1x2000 \
--saveEngine=test_fp32.trt
ONNX Related Tools
onnx-simplify
Tools for simplifying model structure:
pip install onnx onnxsim
onnxsim input.onnx output.onnx
netron
A tool for visualizing ONNX models.
Run pip install netron
and netron [FILE]
or netron.start('[FILE]')
.
ONNX GraphSurgeon
ONNX GraphSurgeon is a tool released by TensorRT for modifying ONNX structures.
Polygraphy
Polygraphy is a tool provided by NVIDIA for testing TensorRT or ONNX. It provides model conversion functionality and allows for debugging of FP16 precision loss. It also allows for specifying layers that should not use FP16..