Autotvm.task_extract_from_program in TFLite

I was trying to optimize a TFLite graph using autotvm.task_extract_from_program but the method returns an empty list. To be precise:

tasks = autotvm.task.extract_from_program(mod["main"], target=target,
                                          params=params, ops=target_op)

I have tried removing ops=target_op so that the method defaults to all operators but still I get an empty task list. The output of that command indicates that the graph is being traversed, I won’t bother you with the entire output, but here is a small sample:

INFO:compile_engine:Use implementation injective.cpu for op nn.bias_add
INFO:compile_engine:Use implementation injective.cpu for op clip
WARNING:strategy:For x86 target, NCHW layout is recommended for conv2d.
INFO:compile_engine:Use implementation conv2d_nhwc.x86 for op nn.conv2d
INFO:compile_engine:Use implementation injective.cpu for op nn.bias_add
WARNING:strategy:depthwise_conv2d NHWC layout is not optimized for x86.
INFO:compile_engine:Use implementation depthwise_conv2d_nhwc.generic for op nn.conv2d
INFO:compile_engine:Use implementation injective.cpu for op nn.bias_add
INFO:compile_engine:Use implementation injective.cpu for op clip
WARNING:strategy:For x86 target, NCHW layout is recommended for conv2d.
INFO:compile_engine:Use implementation conv2d_nhwc.x86 for op nn.conv2d
INFO:compile_engine:Use implementation injective.cpu for op nn.bias_add
INFO:compile_engine:Use implementation injective.cpu for op clip
WARNING:strategy:For x86 target, NCHW layout is recommended for conv2d.
INFO:compile_engine:Use implementation conv2d_nhwc.x86 for op nn.conv2d
INFO:compile_engine:Use implementation injective.cpu for op nn.bias_add
WARNING:strategy:depthwise_conv2d NHWC layout is not optimized for x86.

I have used this before using TF models with no problems so I am wondering if there is a particular issue with TFLite models. My test model is MobileNet_V2 from the Google Zoo. I can compile said model and run inference on the binaries without tuning the graph.

Most likely I am missing something, but I could not find the definition of autotvm.task_extract_from_program in the source code.

Any hints as to why the task list is empty would be greatly appreciated. Thanks!

Looks like your model is in NHWC layout, but TVM now supports NCHW layout better and AFAIK TVM doesn’t have a tunable template for NHWC layout in X86. You may need to use ConvertLayout pass to transform your model to NCHW layout and then extract tasks.

1 Like

Thanks!, I forgot to add that I did the ConvertLayout pass before to transform into NCHW:

 mod, params =\
        relay.frontend.from_tflite(tflite_model,
                                   shape_dict={input_name: dshape},
                                   dtype_dict={input_name: input_type})
    if layout == "NCHW":
        logging.warning("Applying NHWC to NCHW transformation")
        # This is the transformation pass
        seq = tvm.transform.Sequential([
            relay.transform.RemoveUnusedFunctions(),
            relay.transform.ConvertLayout(layout)
        ])
        mod = seq(mod)
        logging.warning("NHWC to NCHW transformation done")

So your model is already in NCHW layout? From the log it seems the model is still in NHWC. You can see whatever the selected implementation (e.g., conv2d_nhwc.x86) and warning (e.g., NHWC layout is not optimized for x86) are all about the NHWC layout. You may need to check if the layout conversion was success or not.

Thank you, now that you mention it, it does make sense.

Ok found the error, the model I was ingesting was not the correct one. With the correct model the above problem does not show up.

Sorry for that!

After trying multiple quantized models the schedule is finally produced. For testing purposes I am using the quantized models for MobileNetV2 from https://www.tensorflow.org/lite/guide/hosted_models However, now I get at least two kinds of errors when generating the binary:

an internal invariant was violated while typechecking your program [09:34:19] /Users/alopez/Documents/Code/tvm/src/relay/qnn/op/convolution.cc:50: Check failed: data->dtype == DataType::Int(8) || data->dtype == DataType::UInt(8): Expected qnn conv2d type(int8, uint8) for input but was float32

This is part of the Relay code:

  %0 = layout_transform(%input, src_layout="NHWC", dst_layout="NCHW");
  %1 = layout_transform(%v_param_1, src_layout="HWIO", dst_layout="OIHW");
  %2 = qnn.conv2d(%0, %1, 128, 122, 0.0078125f, 0.0339689f, strides=[2, 2], padding=[0, 0, 1, 1], channels=32, kernel_size=[3, 3], out_dtype="int32") an internal invariant was violated while typechecking your program [09:34:19] /Users/alopez/Documents/Code/tvm/src/relay/qnn/op/convolution.cc:50: Check failed: data->dtype == DataType::Int(8) || data->dtype == DataType::UInt(8): Expected qnn conv2d type(int8, uint8) for input but was float32
  %3 = expand_dims(%v_param_2, axis=0, num_newaxis=3);
  %4 = layout_transform(%3, src_layout="NHWC", dst_layout="NCHW");
  %5 = add(%2, %4);
  %6 = qnn.requantize(%5, 0.000265382f, 0, 0.0235285f, 0, axis=1, out_dtype="uint8") an internal invariant was violated while typechecking your program [09:34:19] /Users/alopez/Documents/Code/tvm/src/relay/qnn/op/requantize.cc:250: Check failed: data != nullptr: 

I see two problems:

  1. qnn.conv2d is listing two float values as inputs, which considering that the model is quantized it seems strange. I think its getting confused by the scale/zero point in the quantization parameters?
  2. qnn.requantize is returning a null pointer?

I’ll continue looking, but wanted to ask if anyone has seen similar problems with optimizing quantized models.

Regards…

I’m not familiar with the QNN module so I’m calling @anijain2305 for help. I would suggest opening another topic with a proper title for a new problem next time; otherwise it’s easy to be ignored.

Are you giving the right input dtypes to the model. Tflite quantized models need uint8 dtype.

@anijain2305 Thanks for the prompt reply. Yes I am setting dtype_input = "uint8" Also I just verified that optimization of a non-quantized TFlite model does work. In summary, the same optimization script will work for an FP32 version but not for a quantized version. Both models come from https://www.tensorflow.org/lite/guide/hosted_models.

Also the same models will go through TVM when no graph optimization is done. Which means the models work as intended.

IIUC, simple compilation (no auto-tuning) of both FP32 and quantized models work.

But, the auto-tuning + compilation fails for quantized model (while the same script works for FP32), right?

Just to confirm, can you please double check your script?

We specify input shape and dtype for the model while parsing (from_tflite).

So, even though most of the AutoTVM script can be same, there needs to be a small change while passing on the input shape and dtype for FP32 and quantized model.

If you see the error that qnn.conv2d input dtype is float32, it means that the primary input input is float32. This means that input dtype is set incorrectly.

I have double checked the type and made sure the NHWC -> NCHW is applied:

    assert input_type == "uint8", "Quantized models use uint8 input_type"

    mod, params =\
        relay.frontend.from_tflite(tflite_model,
                                   shape_dict={input_name: dshape},
                                   dtype_dict={input_name: input_type})
    #
    # Assume model from TFLite is NHWC, convert to NCHW
    #
    logging.warning("Applying NHWC to NCHW transformation")
    #
    # Dump Relay code
    #
    print(mod["main"].body, file=open('relay_NHWC.txt', 'w'))
    # This is the transformation pass
    with relay.build_config(opt_level=3):
        seq = tvm.transform.Sequential([
            relay.transform.RemoveUnusedFunctions(),
            relay.transform.ConvertLayout(layout)
        ])
        mod = seq(mod)
    #
    # Dump Relay code
    #
    print(mod["main"].body, file=open('relay_NCHW.txt', 'w'))

Note that I added with relay.build_config(opt_level=3): surrounding the transformation. I think I was missing that on my original post.

Also:

But, the auto-tuning + compilation fails for quantized model (while the same script works for FP32), right?

The answer is yes. I am also able to tune the Tensor Flow PF32 model, but that obviously juses a different front end.

Hmm, this is weird. My script seems to work well. Is it possible for you to share the script? If not, can you reach the printing on relay_NHWC.txt for quantized model, or it fails before that?

Right now, I think the process fails in relay.build_module.build(mod, target=target, params=params) That is after the code I showed above. I just verified that the layout transformation takes place by comparing the both relay_NHWC.txt and relay_NCHW.txt.

Let me create a minimal script so you can reproduce the error. I’ll post it in a few, in the meantime below is the contents of relay_NHWC.txt as requested (had to take some text out due to limitations of the post tool)

free_var %input: Tensor[(1, 224, 224, 3), uint8]
free_var %v_param_1: Tensor[(3, 3, 3, 32), uint8]
%0 = qnn.conv2d(%input, %v_param_1, 128 /* ty=int32 */, 122 /* ty=int32 */, 0.0078125f /* ty=float32 */, 0.0339689f /* ty=float32 */, strides=[2, 2], padding=[0, 0, 1, 1], channels=32, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 112, 112, 32), int32] */;
free_var %v_param_2: Tensor[(32), int32]
%1 = nn.bias_add(%0, %v_param_2, axis=3) /* ty=Tensor[(1, 112, 112, 32), int32] */;
%2 = qnn.requantize(%1, 0.000265382f /* ty=float32 */, 0 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 112, 112, 32), uint8] */;
free_var %v_param_3: Tensor[(3, 3, 32, 1), uint8]
%3 = qnn.conv2d(%2, %v_param_3, 0 /* ty=int32 */, 165 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0.343696f /* ty=float32 */, padding=[1, 1, 1, 1], groups=32, channels=32, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32") /* ty=Tensor[(1, 112, 112, 32), int32] */;
free_var %v_param_4: Tensor[(32), int32]
%4 = nn.bias_add(%3, %v_param_4, axis=3) /* ty=Tensor[(1, 112, 112, 32), int32] */;
%5 = qnn.requantize(%4, 0.00808663f /* ty=float32 */, 0 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 112, 112, 32), uint8] */;
free_var %v_param_5: Tensor[(1, 1, 32, 16), uint8]
%6 = qnn.conv2d(%5, %v_param_5, 0 /* ty=int32 */, 140 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0.0373718f /* ty=float32 */, padding=[0, 0, 0, 0], channels=16, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 112, 112, 16), int32] */;
free_var %v_param_6: Tensor[(16), int32]
%7 = nn.bias_add(%6, %v_param_6, axis=3) /* ty=Tensor[(1, 112, 112, 16), int32] */;
%8 = qnn.requantize(%7, 0.0008793f /* ty=float32 */, 0 /* ty=int32 */, 0.354413f /* ty=float32 */, 129 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 112, 112, 16), uint8] */;
free_var %v_param_7: Tensor[(1, 1, 16, 96), uint8]
%9 = qnn.conv2d(%8, %v_param_7, 129 /* ty=int32 */, 127 /* ty=int32 */, 0.354413f /* ty=float32 */, 0.00975851f /* ty=float32 */, padding=[0, 0, 0, 0], channels=96, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 112, 112, 96), int32] */;
free_var %v_param_8: Tensor[(96), int32]
%10 = nn.bias_add(%9, %v_param_8, axis=3) /* ty=Tensor[(1, 112, 112, 96), int32] */;
%11 = qnn.requantize(%10, 0.00345855f /* ty=float32 */, 0 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 112, 112, 96), uint8] */;
free_var %v_param_9: Tensor[(3, 3, 96, 1), uint8]
%12 = qnn.conv2d(%11, %v_param_9, 0 /* ty=int32 */, 109 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0.0209691f /* ty=float32 */, strides=[2, 2], padding=[0, 0, 1, 1], groups=96, channels=96, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32") /* ty=Tensor[(1, 56, 56, 96), int32] */;
free_var %v_param_10: Tensor[(96), int32]
%13 = nn.bias_add(%12, %v_param_10, axis=3) /* ty=Tensor[(1, 56, 56, 96), int32] */;
%14 = qnn.requantize(%13, 0.000493371f /* ty=float32 */, 0 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 56, 56, 96), uint8] */;
free_var %v_param_11: Tensor[(1, 1, 96, 24), uint8]
%15 = qnn.conv2d(%14, %v_param_11, 0 /* ty=int32 */, 156 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0.022536f /* ty=float32 */, padding=[0, 0, 0, 0], channels=24, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 56, 56, 24), int32] */;
free_var %v_param_12: Tensor[(24), int32]
%16 = nn.bias_add(%15, %v_param_12, axis=3) /* ty=Tensor[(1, 56, 56, 24), int32] */;
%17 = qnn.requantize(%16, 0.000530238f /* ty=float32 */, 0 /* ty=int32 */, 0.275834f /* ty=float32 */, 119 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 56, 56, 24), uint8] */;
free_var %v_param_13: Tensor[(1, 1, 24, 144), uint8]
%18 = qnn.conv2d(%17, %v_param_13, 119 /* ty=int32 */, 144 /* ty=int32 */, 0.275834f /* ty=float32 */, 0.0036557f /* ty=float32 */, padding=[0, 0, 0, 0], channels=144, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 56, 56, 144), int32] */;
free_var %v_param_14: Tensor[(144), int32]
%19 = nn.bias_add(%18, %v_param_14, axis=3) /* ty=Tensor[(1, 56, 56, 144), int32] */;
%20 = qnn.requantize(%19, 0.00100837f /* ty=float32 */, 0 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 56, 56, 144), uint8] */;
free_var %v_param_15: Tensor[(3, 3, 144, 1), uint8]
%21 = qnn.conv2d(%20, %v_param_15, 0 /* ty=int32 */, 52 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0.169819f /* ty=float32 */, padding=[1, 1, 1, 1], groups=144, channels=144, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32") /* ty=Tensor[(1, 56, 56, 144), int32] */;
free_var %v_param_16: Tensor[(144), int32]
%22 = nn.bias_add(%21, %v_param_16, axis=3) /* ty=Tensor[(1, 56, 56, 144), int32] */;
%23 = qnn.requantize(%22, 0.00399559f /* ty=float32 */, 0 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 56, 56, 144), uint8] */;
free_var %v_param_17: Tensor[(1, 1, 144, 24), uint8]
%24 = qnn.conv2d(%23, %v_param_17, 0 /* ty=int32 */, 122 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0.0274089f /* ty=float32 */, padding=[0, 0, 0, 0], channels=24, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 56, 56, 24), int32] */;
free_var %v_param_18: Tensor[(24), int32]
%25 = nn.bias_add(%24, %v_param_18, axis=3) /* ty=Tensor[(1, 56, 56, 24), int32] */;
%26 = qnn.requantize(%25, 0.000644889f /* ty=float32 */, 0 /* ty=int32 */, 0.401493f /* ty=float32 */, 136 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 56, 56, 24), uint8] */;
%27 = qnn.add(%26, %17, 0.401493f /* ty=float32 */, 136 /* ty=int32 */, 0.275834f /* ty=float32 */, 119 /* ty=int32 */, 0.432169f /* ty=float32 */, 133 /* ty=int32 */) /* ty=Tensor[(1, 56, 56, 24), uint8] */;
free_var %v_param_19: Tensor[(1, 1, 24, 144), uint8]
%28 = qnn.conv2d(%27, %v_param_19, 133 /* ty=int32 */, 104 /* ty=int32 */, 0.432169f /* ty=float32 */, 0.00299887f /* ty=float32 */, padding=[0, 0, 0, 0], channels=144, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 56, 56, 144), int32] */;
free_var %v_param_20: Tensor[(144), int32]
%29 = nn.bias_add(%28, %v_param_20, axis=3) /* ty=Tensor[(1, 56, 56, 144), int32] */;
%30 = qnn.requantize(%29, 0.00129602f /* ty=float32 */, 0 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 56, 56, 144), uint8] */;
free_var %v_param_21: Tensor[(3, 3, 144, 1), uint8]
%31 = qnn.conv2d(%30, %v_param_21, 0 /* ty=int32 */, 143 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0.0172029f /* ty=float32 */, strides=[2, 2], padding=[0, 0, 1, 1], groups=144, channels=144, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32") /* ty=Tensor[(1, 28, 28, 144), int32] */;
free_var %v_param_22: Tensor[(144), int32]
%32 = nn.bias_add(%31, %v_param_22, axis=3) /* ty=Tensor[(1, 28, 28, 144), int32] */;
%33 = qnn.requantize(%32, 0.000404757f /* ty=float32 */, 0 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 28, 28, 144), uint8] */;
free_var %v_param_23: Tensor[(1, 1, 144, 32), uint8]
%34 = qnn.conv2d(%33, %v_param_23, 0 /* ty=int32 */, 111 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0.0168447f /* ty=float32 */, padding=[0, 0, 0, 0], channels=32, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 28, 28, 32), int32] */;
free_var %v_param_24: Tensor[(32), int32]
%35 = nn.bias_add(%34, %v_param_24, axis=3) /* ty=Tensor[(1, 28, 28, 32), int32] */;
%36 = qnn.requantize(%35, 0.00039633f /* ty=float32 */, 0 /* ty=int32 */, 0.218362f /* ty=float32 */, 127 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 28, 28, 32), uint8] */;
free_var %v_param_25: Tensor[(1, 1, 32, 192), uint8]
%37 = qnn.conv2d(%36, %v_param_25, 127 /* ty=int32 */, 128 /* ty=int32 */, 0.218362f /* ty=float32 */, 0.00192442f /* ty=float32 */, padding=[0, 0, 0, 0], channels=192, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 28, 28, 192), int32] */;
free_var %v_param_26: Tensor[(192), int32]
%38 = nn.bias_add(%37, %v_param_26, axis=3) /* ty=Tensor[(1, 28, 28, 192), int32] */;
%39 = qnn.requantize(%38, 0.000420222f /* ty=float32 */, 0 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 28, 28, 192), uint8] */;
free_var %v_param_27: Tensor[(3, 3, 192, 1), uint8]
%40 = qnn.conv2d(%39, %v_param_27, 0 /* ty=int32 */, 118 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0.0652507f /* ty=float32 */, padding=[1, 1, 1, 1], groups=192, channels=192, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32") /* ty=Tensor[(1, 28, 28, 192), int32] */;
free_var %v_param_28: Tensor[(192), int32]
%41 = nn.bias_add(%40, %v_param_28, axis=3) /* ty=Tensor[(1, 28, 28, 192), int32] */;
%42 = qnn.requantize(%41, 0.00153525f /* ty=float32 */, 0 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 28, 28, 192), uint8] */;
free_var %v_param_29: Tensor[(1, 1, 192, 32), uint8]
%43 = qnn.conv2d(%42, %v_param_29, 0 /* ty=int32 */, 146 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0.0190629f /* ty=float32 */, padding=[0, 0, 0, 0], channels=32, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 28, 28, 32), int32] */;
free_var %v_param_30: Tensor[(32), int32]
%44 = nn.bias_add(%43, %v_param_30, axis=3) /* ty=Tensor[(1, 28, 28, 32), int32] */;
%45 = qnn.requantize(%44, 0.000448521f /* ty=float32 */, 0 /* ty=int32 */, 0.227942f /* ty=float32 */, 121 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 28, 28, 32), uint8] */;
%46 = qnn.add(%45, %36, 0.227942f /* ty=float32 */, 121 /* ty=int32 */, 0.218362f /* ty=float32 */, 127 /* ty=int32 */, 0.25969f /* ty=float32 */, 130 /* ty=int32 */) /* ty=Tensor[(1, 28, 28, 32), uint8] */;
free_var %v_param_31: Tensor[(1, 1, 32, 192), uint8]
%47 = qnn.conv2d(%46, %v_param_31, 130 /* ty=int32 */, 135 /* ty=int32 */, 0.25969f /* ty=float32 */, 0.00136492f /* ty=float32 */, padding=[0, 0, 0, 0], channels=192, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 28, 28, 192), int32] */;
free_var %v_param_32: Tensor[(192), int32]
%48 = nn.bias_add(%47, %v_param_32, axis=3) /* ty=Tensor[(1, 28, 28, 192), int32] */;
%49 = qnn.requantize(%48, 0.000354455f /* ty=float32 */, 0 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 28, 28, 192), uint8] */;
free_var %v_param_33: Tensor[(3, 3, 192, 1), uint8]
%50 = qnn.conv2d(%49, %v_param_33, 0 /* ty=int32 */, 95 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0.0790978f /* ty=float32 */, padding=[1, 1, 1, 1], groups=192, channels=192, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32") /* ty=Tensor[(1, 28, 28, 192), int32] */;
free_var %v_param_34: Tensor[(192), int32]
%51 = nn.bias_add(%50, %v_param_34, axis=3) /* ty=Tensor[(1, 28, 28, 192), int32] */;
%52 = qnn.requantize(%51, 0.00186105f /* ty=float32 */, 0 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 28, 28, 192), uint8] */;
free_var %v_param_35: Tensor[(1, 1, 192, 32), uint8]
%53 = qnn.conv2d(%52, %v_param_35, 0 /* ty=int32 */, 128 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0.0182931f /* ty=float32 */, padding=[0, 0, 0, 0], channels=32, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 28, 28, 32), int32] */;
free_var %v_param_36: Tensor[(32), int32]
%54 = nn.bias_add(%53, %v_param_36, axis=3) /* ty=Tensor[(1, 28, 28, 32), int32] */;
%55 = qnn.requantize(%54, 0.000430409f /* ty=float32 */, 0 /* ty=int32 */, 0.257749f /* ty=float32 */, 124 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 28, 28, 32), uint8] */;
%56 = qnn.add(%55, %46, 0.257749f /* ty=float32 */, 124 /* ty=int32 */, 0.25969f /* ty=float32 */, 130 /* ty=int32 */, 0.331715f /* ty=float32 */, 124 /* ty=int32 */) /* ty=Tensor[(1, 28, 28, 32), uint8] */;
free_var %v_param_37: Tensor[(1, 1, 32, 192), uint8]
%57 = qnn.conv2d(%56, %v_param_37, 124 /* ty=int32 */, 127 /* ty=int32 */, 0.331715f /* ty=float32 */, 0.00191704f /* ty=float32 */, padding=[0, 0, 0, 0], channels=192, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 28, 28, 192), int32] */;
free_var %v_param_38: Tensor[(192), int32]
%58 = nn.bias_add(%57, %v_param_38, axis=3) /* ty=Tensor[(1, 28, 28, 192), int32] */;
%59 = qnn.requantize(%58, 0.000635912f /* ty=float32 */, 0 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 28, 28, 192), uint8] */;
free_var %v_param_39: Tensor[(3, 3, 192, 1), uint8]
%60 = qnn.conv2d(%59, %v_param_39, 0 /* ty=int32 */, 127 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0.0100879f /* ty=float32 */, strides=[2, 2], padding=[0, 0, 1, 1], groups=192, channels=192, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32") /* ty=Tensor[(1, 14, 14, 192), int32] */;
free_var %v_param_40: Tensor[(192), int32]
%61 = nn.bias_add(%60, %v_param_40, axis=3) /* ty=Tensor[(1, 14, 14, 192), int32] */;
%62 = qnn.requantize(%61, 0.000237353f /* ty=float32 */, 0 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 14, 14, 192), uint8] */;
free_var %v_param_41: Tensor[(1, 1, 192, 64), uint8]
%63 = qnn.conv2d(%62, %v_param_41, 0 /* ty=int32 */, 147 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0.0146013f /* ty=float32 */, padding=[0, 0, 0, 0], channels=64, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 14, 14, 64), int32] */;
free_var %v_param_42: Tensor[(64), int32]
%64 = nn.bias_add(%63, %v_param_42, axis=3) /* ty=Tensor[(1, 14, 14, 64), int32] */;
%65 = qnn.requantize(%64, 0.000343546f /* ty=float32 */, 0 /* ty=int32 */, 0.185405f /* ty=float32 */, 126 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 14, 14, 64), uint8] */;
free_var %v_param_43: Tensor[(1, 1, 64, 384), uint8]
%66 = qnn.conv2d(%65, %v_param_43, 126 /* ty=int32 */, 125 /* ty=int32 */, 0.185405f /* ty=float32 */, 0.00155389f /* ty=float32 */, padding=[0, 0, 0, 0], channels=384, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 14, 14, 384), int32] */;
free_var %v_param_44: Tensor[(384), int32]
%67 = nn.bias_add(%66, %v_param_44, axis=3) /* ty=Tensor[(1, 14, 14, 384), int32] */;
%68 = qnn.requantize(%67, 0.0002881f /* ty=float32 */, 0 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 14, 14, 384), uint8] */;
free_var %v_param_45: Tensor[(3, 3, 384, 1), uint8]
%69 = qnn.conv2d(%68, %v_param_45, 0 /* ty=int32 */, 110 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0.0609271f /* ty=float32 */, padding=[1, 1, 1, 1], groups=384, channels=384, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32") /* ty=Tensor[(1, 14, 14, 384), int32] */;
free_var %v_param_46: Tensor[(384), int32]
%70 = nn.bias_add(%69, %v_param_46, axis=3) /* ty=Tensor[(1, 14, 14, 384), int32] */;
%71 = qnn.requantize(%70, 0.00143352f /* ty=float32 */, 0 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 14, 14, 384), uint8] */;
free_var %v_param_47: Tensor[(1, 1, 384, 64), uint8]
%72 = qnn.conv2d(%71, %v_param_47, 0 /* ty=int32 */, 124 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0.0167829f /* ty=float32 */, padding=[0, 0, 0, 0], channels=64, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 14, 14, 64), int32] */;
free_var %v_param_48: Tensor[(64), int32]
%73 = nn.bias_add(%72, %v_param_48, axis=3) /* ty=Tensor[(1, 14, 14, 64), int32] */;
%74 = qnn.requantize(%73, 0.000394877f /* ty=float32 */, 0 /* ty=int32 */, 0.172635f /* ty=float32 */, 109 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 14, 14, 64), uint8] */;
%75 = qnn.add(%74, %65, 0.172635f /* ty=float32 */, 109 /* ty=int32 */, 0.185405f /* ty=float32 */, 126 /* ty=int32 */, 0.18911f /* ty=float32 */, 122 /* ty=int32 */) /* ty=Tensor[(1, 14, 14, 64), uint8] */;
free_var %v_param_49: Tensor[(1, 1, 64, 384), uint8]
%76 = qnn.conv2d(%75, %v_param_49, 122 /* ty=int32 */, 134 /* ty=int32 */, 0.18911f /* ty=float32 */, 0.0014703f /* ty=float32 */, padding=[0, 0, 0, 0], channels=384, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 14, 14, 384), int32] */;
free_var %v_param_50: Tensor[(384), int32]
%77 = nn.bias_add(%76, %v_param_50, axis=3) /* ty=Tensor[(1, 14, 14, 384), int32] */;
%78 = qnn.requantize(%77, 0.000278048f /* ty=float32 */, 0 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 14, 14, 384), uint8] */;
free_var %v_param_51: Tensor[(3, 3, 384, 1), uint8]
%79 = qnn.conv2d(%78, %v_param_51, 0 /* ty=int32 */, 133 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0.0524078f /* ty=float32 */, padding=[1, 1, 1, 1], groups=384, channels=384, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32") /* ty=Tensor[(1, 14, 14, 384), int32] */;
free_var %v_param_52: Tensor[(384), int32]
%80 = nn.bias_add(%79, %v_param_52, axis=3) /* ty=Tensor[(1, 14, 14, 384), int32] */;
%81 = qnn.requantize(%80, 0.00123308f /* ty=float32 */, 0 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 14, 14, 384), uint8] */;
free_var %v_param_53: Tensor[(1, 1, 384, 64), uint8]
%82 = qnn.conv2d(%81, %v_param_53, 0 /* ty=int32 */, 125 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0.0128983f /* ty=float32 */, padding=[0, 0, 0, 0], channels=64, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 14, 14, 64), int32] */;
free_var %v_param_54: Tensor[(64), int32]
%83 = nn.bias_add(%82, %v_param_54, axis=3) /* ty=Tensor[(1, 14, 14, 64), int32] */;
%84 = qnn.requantize(%83, 0.000303476f /* ty=float32 */, 0 /* ty=int32 */, 0.147155f /* ty=float32 */, 123 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 14, 14, 64), uint8] */;
%85 = qnn.add(%84, %75, 0.147155f /* ty=float32 */, 123 /* ty=int32 */, 0.18911f /* ty=float32 */, 122 /* ty=int32 */, 0.199681f /* ty=float32 */, 124 /* ty=int32 */) /* ty=Tensor[(1, 14, 14, 64), uint8] */;
free_var %v_param_55: Tensor[(1, 1, 64, 384), uint8]
%86 = qnn.conv2d(%85, %v_param_55, 124 /* ty=int32 */, 127 /* ty=int32 */, 0.199681f /* ty=float32 */, 0.00137335f /* ty=float32 */, padding=[0, 0, 0, 0], channels=384, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 14, 14, 384), int32] */;
free_var %v_param_56: Tensor[(384), int32]
%87 = nn.bias_add(%86, %v_param_56, axis=3) /* ty=Tensor[(1, 14, 14, 384), int32] */;
%88 = qnn.requantize(%87, 0.000274232f /* ty=float32 */, 0 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 14, 14, 384), uint8] */;
free_var %v_param_57: Tensor[(3, 3, 384, 1), uint8]
%89 = qnn.conv2d(%88, %v_param_57, 0 /* ty=int32 */, 155 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0.0407789f /* ty=float32 */, padding=[1, 1, 1, 1], groups=384, channels=384, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32") /* ty=Tensor[(1, 14, 14, 384), int32] */;
free_var %v_param_58: Tensor[(384), int32]
%90 = nn.bias_add(%89, %v_param_58, axis=3) /* ty=Tensor[(1, 14, 14, 384), int32] */;
%91 = qnn.requantize(%90, 0.000959465f /* ty=float32 */, 0 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 14, 14, 384), uint8] */;
free_var %v_param_59: Tensor[(1, 1, 384, 64), uint8]
%92 = qnn.conv2d(%91, %v_param_59, 0 /* ty=int32 */, 144 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0.0195615f /* ty=float32 */, padding=[0, 0, 0, 0], channels=64, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 14, 14, 64), int32] */;
free_var %v_param_60: Tensor[(64), int32]
%93 = nn.bias_add(%92, %v_param_60, axis=3) /* ty=Tensor[(1, 14, 14, 64), int32] */;
%94 = qnn.requantize(%93, 0.000460252f /* ty=float32 */, 0 /* ty=int32 */, 0.156276f /* ty=float32 */, 122 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 14, 14, 64), uint8] */;
%95 = qnn.add(%94, %85, 0.156276f /* ty=float32 */, 122 /* ty=int32 */, 0.199681f /* ty=float32 */, 124 /* ty=int32 */, 0.220273f /* ty=float32 */, 120 /* ty=int32 */) /* ty=Tensor[(1, 14, 14, 64), uint8] */;
free_var %v_param_61: Tensor[(1, 1, 64, 384), uint8]
%96 = qnn.conv2d(%95, %v_param_61, 120 /* ty=int32 */, 131 /* ty=int32 */, 0.220273f /* ty=float32 */, 0.00162825f /* ty=float32 */, padding=[0, 0, 0, 0], channels=384, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 14, 14, 384), int32] */;
free_var %v_param_62: Tensor[(384), int32]
%97 = nn.bias_add(%96, %v_param_62, axis=3) /* ty=Tensor[(1, 14, 14, 384), int32] */;
%98 = qnn.requantize(%97, 0.00035866f /* ty=float32 */, 0 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 14, 14, 384), uint8] */;
free_var %v_param_63: Tensor[(3, 3, 384, 1), uint8]
%99 = qnn.conv2d(%98, %v_param_63, 0 /* ty=int32 */, 143 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0.0311078f /* ty=float32 */, padding=[1, 1, 1, 1], groups=384, channels=384, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32") /* ty=Tensor[(1, 14, 14, 384), int32] */;
free_var %v_param_64: Tensor[(384), int32]
%100 = nn.bias_add(%99, %v_param_64, axis=3) /* ty=Tensor[(1, 14, 14, 384), int32] */;
%101 = qnn.requantize(%100, 0.00073192f /* ty=float32 */, 0 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 14, 14, 384), uint8] */;
free_var %v_param_65: Tensor[(1, 1, 384, 96), uint8]
%102 = qnn.conv2d(%101, %v_param_65, 0 /* ty=int32 */, 129 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0.00743631f /* ty=float32 */, padding=[0, 0, 0, 0], channels=96, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 14, 14, 96), int32] */;
free_var %v_param_66: Tensor[(96), int32]
%103 = nn.bias_add(%102, %v_param_66, axis=3) /* ty=Tensor[(1, 14, 14, 96), int32] */;
%104 = qnn.requantize(%103, 0.000174965f /* ty=float32 */, 0 /* ty=int32 */, 0.170611f /* ty=float32 */, 129 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 14, 14, 96), uint8] */;
free_var %v_param_67: Tensor[(1, 1, 96, 576), uint8]
%105 = qnn.conv2d(%104, %v_param_67, 129 /* ty=int32 */, 134 /* ty=int32 */, 0.170611f /* ty=float32 */, 0.00163099f /* ty=float32 */, padding=[0, 0, 0, 0], channels=576, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 14, 14, 576), int32] */;
free_var %v_param_68: Tensor[(576), int32]
%106 = nn.bias_add(%105, %v_param_68, axis=3) /* ty=Tensor[(1, 14, 14, 576), int32] */;
%107 = qnn.requantize(%106, 0.000278264f /* ty=float32 */, 0 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 14, 14, 576), uint8] */;
free_var %v_param_69: Tensor[(3, 3, 576, 1), uint8]
%108 = qnn.conv2d(%107, %v_param_69, 0 /* ty=int32 */, 66 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0.0708081f /* ty=float32 */, padding=[1, 1, 1, 1], groups=576, channels=576, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32") /* ty=Tensor[(1, 14, 14, 576), int32] */;
free_var %v_param_70: Tensor[(576), int32]
%109 = nn.bias_add(%108, %v_param_70, axis=3) /* ty=Tensor[(1, 14, 14, 576), int32] */;
%110 = qnn.requantize(%109, 0.00166601f /* ty=float32 */, 0 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 14, 14, 576), uint8] */;
free_var %v_param_71: Tensor[(1, 1, 576, 96), uint8]
%111 = qnn.conv2d(%110, %v_param_71, 0 /* ty=int32 */, 136 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0.00838223f /* ty=float32 */, padding=[0, 0, 0, 0], channels=96, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 14, 14, 96), int32] */;
free_var %v_param_72: Tensor[(96), int32]
%112 = nn.bias_add(%111, %v_param_72, axis=3) /* ty=Tensor[(1, 14, 14, 96), int32] */;
%113 = qnn.requantize(%112, 0.000197221f /* ty=float32 */, 0 /* ty=int32 */, 0.123328f /* ty=float32 */, 127 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 14, 14, 96), uint8] */;
%114 = qnn.add(%113, %104, 0.123328f /* ty=float32 */, 127 /* ty=int32 */, 0.170611f /* ty=float32 */, 129 /* ty=int32 */, 0.176158f /* ty=float32 */, 127 /* ty=int32 */) /* ty=Tensor[(1, 14, 14, 96), uint8] */;
free_var %v_param_73: Tensor[(1, 1, 96, 576), uint8]
%115 = qnn.conv2d(%114, %v_param_73, 127 /* ty=int32 */, 138 /* ty=int32 */, 0.176158f /* ty=float32 */, 0.00182588f /* ty=float32 */, padding=[0, 0, 0, 0], channels=576, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 14, 14, 576), int32] */;
free_var %v_param_74: Tensor[(576), int32]

...

%161 = nn.bias_add(%160, %v_param_102, axis=3) /* ty=Tensor[(1, 7, 7, 320), int32] */;
%162 = qnn.requantize(%161, 0.000188446f /* ty=float32 */, 0 /* ty=int32 */, 0.116945f /* ty=float32 */, 130 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 7, 7, 320), uint8] */;
free_var %v_param_103: Tensor[(1, 1, 320, 1280), uint8]
%163 = qnn.conv2d(%162, %v_param_103, 130 /* ty=int32 */, 125 /* ty=int32 */, 0.116945f /* ty=float32 */, 0.00516707f /* ty=float32 */, padding=[0, 0, 0, 0], channels=1280, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 7, 7, 1280), int32] */;
free_var %v_param_104: Tensor[(1280), int32]
%164 = nn.bias_add(%163, %v_param_104, axis=3) /* ty=Tensor[(1, 7, 7, 1280), int32] */;
%165 = qnn.requantize(%164, 0.000604263f /* ty=float32 */, 0 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 7, 7, 1280), uint8] */;
%166 = cast(%165, dtype="int32") /* ty=Tensor[(1, 7, 7, 1280), int32] */;
%167 = nn.avg_pool2d(%166, pool_size=[7, 7], layout="NHWC") /* ty=Tensor[(1, 1, 1, 1280), int32] */;
%168 = cast(%167, dtype="uint8") /* ty=Tensor[(1, 1, 1, 1280), uint8] */;
free_var %v_param_105: Tensor[(1, 1, 1280, 1001), uint8]
%169 = qnn.conv2d(%168, %v_param_105, 0 /* ty=int32 */, 113 /* ty=int32 */, 0.0235285f /* ty=float32 */, 0.00169108f /* ty=float32 */, padding=[0, 0, 0, 0], channels=1001, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 1, 1, 1001), int32] */;
free_var %v_param_106: Tensor[(1001), int32]
%170 = nn.bias_add(%169, %v_param_106, axis=3) /* ty=Tensor[(1, 1, 1, 1001), int32] */;
%171 = qnn.requantize(%170, 3.97886e-05f /* ty=float32 */, 0 /* ty=int32 */, 0.0988925f /* ty=float32 */, 58 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 1, 1, 1001), uint8] */;
reshape(%171, newshape=[1, 1001]) /* ty=Tensor[(1, 1001), uint8] */

Here is the script that reproduces the issue:

import sys
import os

import tvm
from tvm import relay
from tvm import autotvm
from tvm.autotvm.tuner import XGBTuner, GATuner, RandomTuner, GridSearchTuner
from tvm.autotvm.graph_tuner import DPTuner, PBQPTuner

import tflite.Model


#
# This function loads the model
#
def get_model(input_type):
    #
    # Models come from https://www.tensorflow.org/lite/guide/hosted_models
    # The path is hard coded, change to match your setup
    #
    base = "./my_models/mobilenet_v2/"
    model_name = base + "mobilenet_v2_1.0_224_quant.tflite"
    #
    # Model parameters
    #
    input_name = "input"
    dshape = (1, 224, 224, 3)
    
    with open(model_name, "rb") as f:
        tflite_model_buf = f.read()

    assert input_type == "uint8", "Quantized models use uint8 input_type"
    
    tflite_model = tflite.Model.Model.GetRootAsModel(tflite_model_buf, 0)

    mod, params =\
        relay.frontend.from_tflite(tflite_model,
                                   shape_dict={input_name: dshape},
                                   dtype_dict={input_name: input_type})
    #
    # Assume model from TFLite is NHWC, convert to NCHW
    #
    # print(mod["main"].body, file=open('relay_NHWC.txt', 'w'))
    with relay.build_config(opt_level=3):
        seq = tvm.transform.Sequential([
            relay.transform.RemoveUnusedFunctions(),
            relay.transform.ConvertLayout(layout)
        ])
        mod = seq(mod)
    # print(mod["main"].body, file=open('relay_NCHW.txt', 'w'))

    return mod, params, input_name, dshape


#
# Kernel tuning function
#
def tune_kernels(tasks,
                 n_trial,
                 measure_option,
                 tuner='random',
                 early_stopping=None,
                 log_filename='tuning.log',
                 graph_filename=None,
                 layout=None,
                 target=None,
                 context=None,
                 use_existing=None):

    for i, task in enumerate(tasks):
        prefix = "[Task %2d/%2d] " % (i+1, len(tasks))

        # create tuner
        if tuner == 'xgb' or tuner == 'xgb-rank':
            tuner_obj = XGBTuner(task, loss_type='rank')
        elif tuner == 'ga':
            tuner_obj = GATuner(task, pop_size=50)
        elif tuner == 'random':
            tuner_obj = RandomTuner(task)
        elif tuner == 'gridsearch':
            tuner_obj = GridSearchTuner(task)
        else:
            raise ValueError("Invalid tuner: " + tuner)

        # do tuning
        n_trial = min(n_trial, len(task.config_space))

        tuner_obj.tune(n_trial=n_trial,
                       early_stopping=early_stopping,
                       measure_option=measure_option,
                       callbacks=[
                           autotvm.callback.progress_bar(n_trial,
                                                         prefix=prefix),
                           autotvm.callback.log_to_file(log_filename)])


#
# Use graph tuner to achieve graph level optimal schedules
# Set use_DP=False if it takes too long to finish.
#
def tune_graph(graph, input_name, dshape, records, opt_sch_file, use_DP=True):
    target_op = [relay.op.get("nn.conv2d"),
                  relay.op.get("nn.contrib_depthwise_conv2d_NCHWc"),
                  relay.op.get("nn.contrib_conv2d_NCHWc"), ]
    Tuner = DPTuner if use_DP else PBQPTuner

    executor = Tuner(graph,
                     {input_name: dshape},
                     records,
                     target_op,
                     target)
    executor.benchmark_layout_transform(min_exec_num=2000)
    executor.run()
    executor.write_opt_sch2record_file(opt_sch_file)


#
# Tune the graph
#
def tune(tuning_opt, input_type):
    #
    # Read the model and get the relevant parameters, then extract workload
    #
    mod, params, input_name, data_shape = get_model(input_type)
    #
    # Tune the tasks
    #
    target_op = [relay.op.get("nn.conv2d"),
                 relay.op.get("nn.contrib_depthwise_conv2d_NCHWc"),
                 relay.op.get("nn.contrib_conv2d_NCHWc"), ]
    tasks = autotvm.task.extract_from_program(mod["main"], target=target,
                                              params=params,
                                              ops=target_op)
    
    tune_kernels(tasks, **tuning_opt)
    
    tune_graph(mod["main"],
               input_name,
               data_shape,
               sch_log,
               graph_log,
               True)

    with autotvm.apply_graph_best(graph_log):
        logging.info("Compiling the schedule")
        with relay.build_config(opt_level=3):
            graph, lib, params = relay.build_module.build(
                mod, target=target, params=params)
        #
        # Export the model
        #
        base = "./out/"
        lib.export_library(base + "binary.so")
        with open(base + "graph.json", "w") as fo:
            fo.write(graph)
            with open(base + "params.params", "wb") as fo:
                fo.write(relay.save_param_dict(params))


if __name__ == "__main__":
    #
    # Global variables that define the model
    #
    target = "llvm -mcpu=skylake"
    ctx = tvm.cpu()
    model_name = "mobilenetv2"
    dtype_input = "uint8"
    layout = "NCHW"
    batch_size = 1
    
    num_threads = 4
    os.environ["TVM_NUM_THREADS"] = str(num_threads)
    #
    # Set the log filenames
    #
    graph_log = "%s_%s_graph_opt.log" % (model_name, dtype_input)
    sch_log = "%s_%s.log" % (model_name, dtype_input)
    #
    # Tuning parameters
    #
    tuning_option = {
        "log_filename": sch_log,
        "graph_filename": graph_log,
        "layout": layout,
        "target": target,
        "context": ctx,
        "tuner": "random",
        "n_trial": 20,
        "early_stopping": None,
        "measure_option": autotvm.measure_option(
            builder=autotvm.LocalBuilder(timeout=100),
            runner=autotvm.LocalRunner(number=10, repeat=1,
                                       min_repeat_ms=1000),
            ),
        }

    tune(tuning_option, dtype_input)

The problem should look like (just showing the first few lines):

  %0 = layout_transform(%input, src_layout="NHWC", dst_layout="NCHW");
  %1 = layout_transform(%v_param_1, src_layout="HWIO", dst_layout="OIHW");
  %2 = qnn.conv2d(%0, %1, 128, 122, 0.0078125f, 0.0339689f, strides=[2, 2], padding=[0, 0, 1, 1], channels=32, kernel_size=[3, 3], out_dtype="int32") an internal invariant was violated while typechecking your program [13:38:14] /Users/alopez/Documents/Code/tvm/src/relay/qnn/op/convolution.cc:50: Check failed: data->dtype == DataType::Int(8) || data->dtype == DataType::UInt(8): Expected qnn conv2d type(int8, uint8) for input but was float32
; ;
  %3 = expand_dims(%v_param_2, axis=0, num_newaxis=3);
  %4 = layout_transform(%3, src_layout="NHWC", dst_layout="NCHW");
  %5 = add(%2, %4);
  %6 = qnn.requantize(%5, 0.000265382f, 0, 0.0235285f, 0, axis=1, out_dtype="uint8") an internal invariant was violated while typechecking your program [13:38:14] /Users/alopez/Documents/Code/tvm/src/relay/qnn/op/requantize.cc:250: Check failed: data != nullptr: 
....

Let me know if you need more information, and thanks for looking into this!

1 Like

Thanks for sharing. The failure is while calling tune_graph. The graph tuning assumes the data to be float32. Additionally, last time I tried, the graph tuning cant work with QNN ops. One way to handle this is to call QnnCanonilcalize (python/tvm/relay/qnn/transform.py) before calling graph tuning. But, ideally, the graph tuning should be changed to support qnn ops as well.

I would suggest skipping the tune_graph for now .

Thanks! I’ll give that a try.

Ok so I commented out the tune_graph() call and use sch_log which is the schedule from tune_kernels()

with autotvm.apply_graph_best(sch_log):
        logging.info("Compiling the schedule")
        with relay.build_config(opt_level=3):
            graph, lib, params = relay.build_module.build(
                mod, target=target, params=params)

However, now I get some interesting errors:

  %268 = layout_transform(%267, src_layout="NCHW77c", dst_layout="NCHW11c");
  %269 = add(%259, %268) Incompatible broadcast type TensorType([1, 2, (int64)28, (int64)28, 11], int32) and TensorType([1, 0, (int64)28, (int64)28, 11], int32); ;
  %270 = subtract(%269, 130);
  %271 = clip(%270, a_min=0f, a_max=255f);
  %272 = cast(%271, dtype="uint8");
  %273 = cast(%272, dtype="int16");
  %274 = subtract(%273, meta[relay.Constant][36]);
  %275 = layout_transform(%274, src_layout="NCHW11c", dst_layout="NCHW40c") an internal invariant was violated while typechecking your program [17:15:34] /Users/alopez/Documents/Code/tvm/src/relay/op/tensor/transform.cc:2328: Check failed: data != nullptr:
...
  %289 = subtract(%288, meta[relay.Constant][3]);
  %290 = layout_transform(%289, src_layout="NCHW11c", dst_layout="NCHW8c") an internal invariant was violated while typechecking your program [17:15:34] /Users/alopez/Documents/Code/tvm/src/relay/op/tensor/transform.cc:2328: Check failed: data != nullptr: 

The internal invariant violation error (Check failed: data != nullptr:) happens a lot, almost on every layout transformation. I recalled reading in the forums that there was some issue with the optimization level on those transformations so setting opt_level=2 finally got the process to work.

So heads up, there is something else going on that may cause problems in the future. I’ll try to debug what is going on, but I’ll have to understand the code base a bit more for me to be effective here.