Skip to main content

How to Handle Consistency in Training Inference

This section introduces how to ensure consistency between deployment results and training results, as well as debugging experience.

Common Inconsistencies

  • Inconsistencies between GPU decoding and preprocessing and the libraries used during training.

  • For CPU decoding and preprocessing using OpenCV, training and inference results are usually consistent. However, in some environments, differences in CPU versions, underlying libraries (libjpeg, libjpeg-turbo), compilation methods (WITH_IPP=ON/OFF), and other factors may cause some differences in results.

  • Differences in processes, such as different results when resizing the same image and sending it as an uint8 type or a float type.

  • TensorRT fp16 brings some differences; fp32 may bring differences in the fourth decimal place; some models cannot do fp16 quantization for all layers and require tools for partial quantization.

  • Differences in C++ logic and Python logic.

Debugging Techniques

To locate the differences in inconsistent reasoning output during training, the following methods can be used:

  • Compare input and output line by line/node by node from start to finish. The output can be exported to the Python side, or output to a file using the following backends: SaveMat, SaveTensor, Mat2Tensor, Tensor2Mat.

  • Replace local inputs. The nodes with differences can be skipped and the subsequent processes can be compared. The following backend tools can be used: LoadTensor.

  • Save the tensor using the functions in torch_utils.hpp:

void ipipe::save(std::string name, at::Tensor input)
// Save the at::Tensor to a file. Equivalent to torch.save(name: str, tensor: torch.Tensor)

// Parameters:

// name – file path
// input – input tensor

Strategies for solving training and reasoning consistency issues

  • It is recommended to locate and skip small differences by replacing local inputs, and then continue to locate subsequent differences. Some differences can be ignored during final deployment.

  • It is recommended to perform extensive data testing before going online to prevent result errors caused by low probability errors and ensure the stability of the final deployment.

  • The input data during training can be processed online or offline using the relevant preprocessing flow at the time of deployment.

c++ first vs. python first

Due to some engineering and practical reasons (less workload to achieve high-performance goals), the current stage mainly relies on static graphs that are independent of Python to achieve the ideal acceleration effect, which makes debugging less friendly.

The good thing is that torchpipe can be gradually adopted in stages, which reduces the pain of debugging.