Segmentation Fault in relay.quantize.quantize

Hello,

Started playing with quantization. Was able to compile and run a MobileNet model but when I try the ssd_512_mobilenet1.0_voc from the zoo I get a segmentation fault:

Segmentation fault: 11

Stack trace returned 10 entries:

[bt] (0) 0 libmxnet.so 0x000000012cfcbc90 std::__1::__tree<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, std::__1::__map_value_compare<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::allocator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*> > >::destroy(std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, void*>*) + 2736

[bt] (1) 1 libmxnet.so 0x000000012ec611b6 mxnet::Storage::Get() + 4358

[bt] (2) 2 libsystem_platform.dylib 0x00007fff7a53ab5d _sigtramp + 29

[bt] (3) 3 ??? 0x0000000000000028 0x0 + 40

[bt] (4) 4 libtvm.dylib 0x00000001192c659f void tvm::runtime::detail::unpack_call_dispatcher<bool, 0, 4, bool ()(tvm::Array<tvm::relay::Type, void> const&, int, tvm::Attrs const&, tvm::relay::TypeReporter const&)>::run<tvm::runtime::TVMArgValue, tvm::runtime::TVMArgValue, tvm::runtime::TVMArgValue, tvm::runtime::TVMArgValue>(bool ( const&)(tvm::Array<tvm::relay::Type, void> const&, int, tvm::Attrs const&, tvm::relay::TypeReporter const&), tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*, tvm::runtime::TVMArgValue&&, tvm::runtime::TVMArgValue&&, tvm::runtime::TVMArgValue&&, tvm::runtime::TVMArgValue&&) + 95

[bt] (5) 5 libtvm.dylib 0x00000001192c64f9 std::__1::__function::__func<void tvm::runtime::TypedPackedFunc<bool (tvm::Array<tvm::relay::Type, void> const&, int, tvm::Attrs const&, tvm::relay::TypeReporter const&)>::AssignTypedLambda<bool ()(tvm::Array<tvm::relay::Type, void> const&, int, tvm::Attrs const&, tvm::relay::TypeReporter const&)>(bool ()(tvm::Array<tvm::relay::Type, void> const&, int, tvm::Attrs const&, tvm::relay::TypeReporter const&))::‘lambda’(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*), std::__1::allocator<void tvm::runtime::TypedPackedFunc<bool (tvm::Array<tvm::relay::Type, void> const&, int, tvm::Attrs const&, tvm::relay::TypeReporter const&)>::AssignTypedLambda<bool ()(tvm::Array<tvm::relay::Type, void> const&, int, tvm::Attrs const&, tvm::relay::TypeReporter const&)>(bool ()(tvm::Array<tvm::relay::Type, void> const&, int, tvm::Attrs const&, tvm::relay::TypeReporter const&))::‘lambda’(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)>, void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&) + 137

[bt] (6) 6 libtvm.dylib 0x0000000119602e78 tvm::TypedEnvFunc<bool (tvm::Array<tvm::relay::Type, void> const&, int, tvm::Attrs const&, tvm::relay::TypeReporter const&)>::operator()(tvm::Array<tvm::relay::Type, void> const&, int, tvm::Attrs const&, tvm::relay::TypeReporter const&) const + 328

[bt] (7) 7 libtvm.dylib 0x00000001196027f3 tvm::relay::TypeSolver::Solve() + 1155

[bt] (8) 8 libtvm.dylib 0x00000001195e2bab tvm::relay::TypeInferencer::Infer(tvm::relay::Expr) + 107

[bt] (9) 9 libtvm.dylib 0x00000001195e3a5a tvm::relay::InferType(tvm::relay::Function const&, tvm::relay::Module const&, tvm::relay::GlobalVar const&) + 570

Here is the relevant code:

import tvm
from tvm import relay
import tvm.relay.testing
from gluoncv import model_zoo

supported_model = [
“mobilenetv2_1.0”,
“ssd_512_resnet50_v1_voc”,
“ssd_512_resnet50_v1_coco”,
“ssd_512_resnet101_v2_voc”,
“ssd_512_mobilenet1.0_voc”,
“ssd_512_mobilenet1.0_coco”,
“ssd_300_vgg16_atrous_voc”,
“ssd_512_vgg16_atrous_coco”,
“ssd_512_mobilenet1.0_voc_int8”
]

model_name = supported_model[4]

Model parameters

input_name = “input”
input_dtype = “float32”
out_dtype = “int8”
out_numbits = 8
dshape = (1, 3, 512, 512)

Declare the CPU type

myTarget = “llvm -mcpu=haswell”
ctx = tvm.cpu()

Read the model

block = model_zoo.get_model(model_name, pretrained=True)

Ingest model into compiler then build.

def build(target):
print("\nBuilding quantized code\n")
mod, params = relay.frontend.from_mxnet(block,
{input_name: dshape},
dtype=input_dtype)

with relay.quantize.qconfig(nbit_input=out_numbits,
                            nbit_weight=out_numbits,
                            nbit_activation=out_numbits,
                            dtype_input=out_dtype,
                            dtype_weight=out_dtype,
                            dtype_activation=out_dtype,
                            global_scale=8.0,
                            skip_conv_layers=[0],
                            round_for_shift=True,
                            store_lowbit_output=False,
                            debug_enabled_ops=None):
    mod["main"] = relay.quantize.quantize(mod["main"], params)

Any help is greatly appreciated.

Thanks!

Also, I have tried changing the type and number of bits of the activation layer to float32 and 32. Still get similar segmentation fault results.

I’m facing a similar issue but in my case it’s model related, meaning that I’m able to quantize some models but I get segmentation fault with other (without any error trace).

Is it because some ops are still not supported for quantization?

It could be, the full explanation can be found here:

I haven’t tried going from TF->TVM recently as I am going through quantized model in TFLite->TVM route, it works but the performance on inference is worse than FP32 (original TF model) I think that the autotvm optimization is still being worked on, but once its ready I should expect better performance from INT-only models.

@alopez_13 Which TFLite model do you use and which Intel server machine? For Cascade lake, you should performance improvement. Let me know if you can share the details, I might be able to help.

I was working with a quantized MobileNet_v2 from tensorflow zoo. The problem I had was that during graph optimization (auto tuning) I was getting an error about "only support NCHW/HWCN currently" when calling:

tasks = autotvm.task.extract_from_program(sym["main"],
                                              target=target,
                                              params=params,
                                              ops=(relay.op.nn.conv2d,))

I thought it was because the convolution operator did not support quantization, but now that I think of it, I may need to call a specific relay operator? I’ll dig around in the forums…

By the way also looked at:

and, if I understood correctly, his proposed solution was to change func_create = “topi_x86_conv2d_NCHWc” to func_create = “topi_x86_conv2d_NCHWc_int8” in the tuning function. Unfortunately, I still get the same error. I’ll keep digging, most likely I am not setting the tuning parameters right.

An update. Changing the tuner function will not help since the error is taking place on autotvm.task.extract_from_program thus I think it has to do with the data layout that TFLite produces compared to what is expected in TVM. Now I remember reading about this in the forums.

I think you are right about the error. This might be also because of depthwise convolution in MobileNetV2. To further narrow down, there are 2 things that can be done

  • Run w/o auto-tuning and see if the model passes.
  • Try model w/o depthwise convolution.

Thanks. I know its auto tuning because I can do inference on that quantized model without it. I have not ran an exhaustive sweep to check for accuracy, but it does give me the same classes compared to the original TensorFlow and TFLite (without the compiler) models. Let me explore w/o depthwise convolution.

Which machine are you using?

I have ran in two x86 boxes, an old macOS (haswell) and a newer linux-based server (skylake). But I think the error is related to the data layout. TFLite does not use NCHW by default and that is why when you call autotvm.task.extract_from_program it complains. The error happens before you actually do the tuning.