Segmentation Fault in relay.quantize.quantize with target skylake-avx512/cascadelake

Hi,

I’m using TVM to quantize ResNet-50.

After importing model from Tensorflow Saved Model, I simply use

mod = relay.quantize.quantize(mod, params=params) to quantize model.

I was able to build and run with the target “llvm”, but got segment fault with the target “llvm -mcpu=skylake-avx512” and “llvm -mcpu=cascadelake” during relay.build.

Any help is greatly appreciated.

Thanks!

You can build the TVM with debug flag and locate the root cause.

I’m testing on a machine with an Intel ® Xeon ® Platinum 8269(Cascade Lake) CPU, and trying to evaluate the performance of TVM quantization with the acceleration of VNNI instruction.

I also tried to modify https://github.com/vinx13/tvm-cuda-int8-benchmark/tree/latest. I changed the target of to cpu without autotvm’s configuration to evaluate ResNet-50. It failed as the same of from tensorflow.

How to debug relay.build process? I only find a document about graph_runtime debugger https://docs.tvm.ai/dev/debugger.html.

Since your segment fault is very likely to take place in the intrinsics, I think it’s better to build TVM with CMAKE_BUILD_TYPE=Debug , and start your python script with

$ gdb --fullname python3
(gdb) run your_script.py
<your segment fault take place here>
(gdb) bt

The back trace command would help you locate the error.


You might interested in looking at PR #4196 .

This seems like the legalization or AlterOpLayout problem. Try with opt_level=0 and see if it passes.

I am travelling right now, so will not be able to reproduce the problem on my end quickly. You can also try 'return None` in Legalize and AlterOpLayout for conv2d and see if it passes. If it does, then the problem is definitely in one of those 2 passes.

Thanks for your reply. The segmentation fault won’t show up in debug mode.

Thanks for your reply. I’ve tried with opt_level=0 and it passed. But the performance is fairly poor. I also tried to add LOG in code. It crashed when performing Legalize pass or FoldScaleAxis pass.

Do you have any suggestions?

This can be reproduced by:

import tensorflow as tf
import numpy as np

raw_images = tf.placeholder("float32", shape=[1, 224, 224, 3], name="raw_images")
filter = tf.constant(np.ones([3,3,3,1], dtype=np.float32))
output = tf.nn.conv2d(raw_images, filter, padding='SAME')
filter = tf.constant(np.ones([3,3,1,1]), dtype=np.float32)

for i in range(60):
    output = tf.nn.conv2d(output, filter, padding='SAME')

with tf.Session() as sess:
   tf.saved_model.simple_save(sess, export_dir="./simple",
                                   inputs={"raw_images": raw_images},
                                   outputs={"output": output})

But if I decrease the number of conv2d nodes to 50, the segmentation fault just disappear.

Hi @dingli

I am back from vacation now. Can you please provide me the complete testcase so that I can quickly reproduce the error? Thanks!