Skip to main content

How to Convert Torch Models to ONNX

note

In all of our known practices, TorchPipe can completely replace torch2trt through static ONNX composition, dynamic ONNX, pre-generated TensorRT models, and other methods.

Torch to ONNX Conversion

The framework prioritizes dynamic batch or static batch with batchsize==1. In reality, some models cannot be converted to dynamic scale or are prone to errors. We also support loading multiple models with different static batch sizes at the same time to simulate dynamic scale. The following instructions mainly apply to exporting dynamic batch size models.

Exporting Dynamic Batch Size Models
  • The following operations make dynamic batch size unavailable: x.view(int(x.size(0)), -1). Check if the model file has hardcoded the batch dimension, such as x.view(int(x.size(0)), -1, 1, 1), x.reshape(int(x.size(0)), -1, 1, 1), etc., which may cause problems with dynamic batch size after converting to ONNX. Note that in Transformer-like networks, the batch dimension is not necessarily in the 0th dimension.
  • When the batch dimension is specified as dynamic size, low-version TensorRT has weaker processing capabilities and more redundant operators. For example, for x.view(x.size(0), -1), Gather and other operators will be introduced in ONNX to calculate the first dimension of x. It can be modified to x = x.view(-1, int(x.size(1)*x.size(2)*x.size(3))) or x = torch.flatten(x, 1). This is not necessary.
  • For some models (TensorRT 8.5.1, LSTM, and Transformer), when the batch dimension and non-batch dimension are both dynamic, more resources may be consumed:
  • For LayerNorm layers and Transformer-like networks with dynamic batch size, opset>=17 and TensorRT>=8.6.1 are recommended.
# When both batch and non-batch dimensions are dynamic, it takes 9ms (inference input size is optShapes=input:1x1000x80,mask:1x1x1000):
/opt/tensorrt/bin/trtexec --onnx=test_fp32.onnx --shapes=input:1x1000x80,mask:1x1x1000 --workspace=64000 \
--minShapes=input:1x20x80,mask:1x1x20 \
--optShapes=input:1x1000x80,mask:1x1x1000 \
--maxShapes=input:4x2000x80,mask:4x1x2000


# When batchsize==1 is fixed, it only takes 4.6ms:
/opt/tensorrt/bin/trtexec --onnx=test_fp32.onnx --shapes=input:1x1000x80,mask:1x1x1000 --workspace=64000 \
--minShapes=input:1x20x80,mask:1x1x20 \
--optShapes=input:1x1000x80,mask:1x1x1000 \
--maxShapes=input:1x2000x80,mask:1x1x2000

At this point, it is recommended to discretize only one dimension.

Best Practices

After modifying the network, you can use the following code to convert the PyTorch model to an ONNX model:

x = torch.randn(1,3,224,224).cuda()
out_size = 1
in_out={"input":{0:"batch_size"}}
for i in range(out_size):
in_out[f"output_{i}"] = {0:"batch_size"}

torch_model.cuda().eval()
torch.onnx.export(torch_model,
x,
onnx_save_path,
opset_version=17,
do_constant_folding=True,
input_names=["input"], # input name
output_names=[f"output_{i}" for i in range(out_size)], # output names
dynamic_axes=in_out)

import onnx
from onnxsim import onnx_simplifier

onnx_model = onnx.load(onnx_save_path)
onnx_model = onnx.shape_inference.infer_shapes(onnx_model)
model_simp, check = onnx_simplifier.simplify(onnx_model, check_n = 0)
onnx.save(model_simp, onnx_save_path)
`torchpipe.utils.models.onnx_export`(effective from 0.3.2b1):
  • This tool can convert PyTorch models to ONNX models and save them locally. It only supports single input.
  • It supports dynamic batch and comes with onnx-simplify optimization.
def onnx_export(model: Union[torch.nn.Module, torch.jit.ScriptModule, torch.jit.ScriptFunction], onnx_path, input = None, opset = 17):
Parameters
  • model - PyTorch model.
  • onnx_path - Path to save the ONNX model.
  • input - Model input. Defaults to torch.randn(1,3,224,224) if not set.
  • opset - ONNX opset.
Example Code
import os, tempfile
from torchvision import models
import torch
import torchpipe

## export onnx
m = models.resnet50(weights=None).eval()
onnx_path = os.path.join(tempfile.gettempdir(), f"resnet50.onnx")
torchpipe.utils.models.onnx_export(m, onnx_path, torch.randn(1, 3, 224, 224), opset=17)

Reasons for Conversion Failure

When converting from torch to ONNX, it is common to encounter conversion failures. Here are some methods that can be used:

  • Keep dynamic dimensions dynamic. For example, for YOLOX:
x = x.view(int(x.size(0)), -1, 1, 1)
# Change to:
x = x.flatten(1).unsqueeze(2).unsqueeze(2)

x = x.view(int(x.size(0)), -1)
# Change to:
x = x.view(-1, int(x.size(1)*x.size(2)*x.size(3)))
  • Change boolean values to float:
tgt_padding_mask = (tgt_in == self.eos_id)
# Change to:
tgt_padding_mask = (tgt_in == self.eos_id).float()
  • Simplify using onnx-simplify.
  • Try different versions due to version issues:
    • Use the latest version as much as possible, such as onnx opset >= 14 and tensorrt >= 8.2.
    • For tensorrt7, it is recommended to use onnx version 1.9.0 and onnx opset = 11.
  • Try using trtexec for model conversion:
    
/opt/tensorrt/bin/trtexec --onnx=folded.onnx
--workspace=64000 \
--minShapes=input:1x20x80,mask:1x1x20 \
--optShapes=input:1x1000x80,mask:1x1x1000 \
--maxShapes=input:1x2000x80,mask:1x1x2000 \
--saveEngine=test_fp32.trt

onnx-simplify

Tools for simplifying model structure:

pip install onnx onnxsim
onnxsim input.onnx output.onnx

netron

A tool for visualizing ONNX models.

Run pip install netron and netron [FILE] or netron.start('[FILE]').

ONNX GraphSurgeon

ONNX GraphSurgeon is a tool released by TensorRT for modifying ONNX structures.

Polygraphy

Polygraphy is a tool provided by NVIDIA for testing TensorRT or ONNX. It provides model conversion functionality and allows for debugging of FP16 precision loss. It also allows for specifying layers that should not use FP16..