RuntimeWarning divide by zero encountered in true_divide when converting INT8 model to Relay graph

Issue description

When converting a pretrained quantized GluonCV model to a TVM Relay graph a range parameter related to quantization is set to zero causing a RuntimeWarning about division by zero.

Steps to reproduce the issue

  1. Prepare hardware and environment that meet the requirements for TVM
  2. Install MXNet 1.5.1 or 1.6.0, GluonCV 0.7.0, and the latest MKL-DNN library
  3. Build TVM with USE_MKLDNN ON
  4. Download a pretrained INT8 model from GluonCV with gluoncv.model_zoo_get_model()
  5. Convert the model to a TVM Relay graph with tvm.relay.frontend.from_mxnet()

What’s the expected result?

  • Graph conversion yields no RuntimeWarning

What’s the actual result?

  • Graph conversion yields the following RuntimeWarning:

    /usr/tvm/python/tvm/relay/frontend/mxnet_qnn_op_utils.py:53: RuntimeWarning: divide by zero encountered in true_divide
    scale = np.divide(quantized_range, real_range)
    

Additional details

  • Debugging the code pointed to by the warning message reveals that the argument real_range is zero and the RuntimeWarning is thrown by the NumPy function divide()

  • The argument real_range is zero because the parameters min_calib_range and max_calib_range in _qnn_quantize() in relay/frontend/mxnet.py are set to their default values of 0.0 .

  • The parameters min_calib_range and max_calib_range are set to their default values because the relevant operator, e.g., _contrib_quantize_v2, has no attributes min_calib_range and max_calib_range

  • When compiling the model ssd_512_vgg16_atrous_voc_int8 with tvm.relay.build() at opt_level=3 after graph conversion is complete, the following error occurs and compilation fails, as described in TVMError Check failed Divide by zero when compiling INT8 model :

    TVMError: Check failed: fb->value != 0 (0 vs. 0) : Divide by zero
    
  • Hard-coding the default value of max_calib_range to 20.0 (arbitrarily picked) in _qnn_quantize() removes both the RuntimeWarning and the above compilation error, and inference is completed successfully with the model ssd_512_vgg16_atrous_voc_int8

Suggested solutions

  • Fix TVM to properly handle the case where an operator of a quantized model has no attributes min_calib_range or max_calib_range

hi, Do we have any update about this issue? Currently, I’ve meet the “same” issue (report exact the same error message with yours) when I am trying to compile the GPT-2 ONNX model. I got the GPT-2 model from PyTorch and converted it to the ONNX format. Since the GPT-2 is a large model and I am still trying to debug what’s the root cause on my side, I appreciate it very much if you could have any input.

Thanks!

To my knowledge no solution yet, but I have not tested with more recent TVM releases. In the end I abandoned attempts with GluonCV prequantised models and instead used TVM’s quantize framework to create INT8 models from a non-quantised GluonCV model. After tuning the resulting INT8 model with AutoTVM I got some nice speedups compared to the original (non-quantised) GluonCV model.

You say that you get the exact same error message as I reported in my post. It is not clear if you are referring to the RuntimeWarning, or to the TVMError that occurs when compiling ssd_512_vgg16_atrous_voc_int8, or both. Either way, I’d suggest you try:

  1. using a more recent official release of TVM
  2. using from_onnx() to convert your model to Relay graph (if your RuntimeWarning includes an “mxnet” file name you’re probably using the wrong graph conversion function, e.g., from_mxnet())
  3. using TVM’s quantize framework to create the INT8 model from a (Relay graph converted) FP32 ONNX model, instead of quantising with onnxruntime and Relay graph converting from an INT8 ONNX model
  4. as a last resort, the hard-coding workaround I reported in my post, i.e., set the default value of max_calib_range to something non-zero in the TVM source code

hi, Cecilia, I really appreciate your detailed information as well as the suggested steps for me to find out the root cause of this “divided by 0” issue. I’ve already taken your first two steps. Besides, in my case I didn’t apply the quantization to my model before it has been compiled by TVM. As a result, I am trying step 4 in your suggestions. Once again, thanks for your detailed guidance!