[Quantization] Problems with recent refactoring changes in the quantization pass


After the last changes in the quantization in the following commit, I am facing some issues:

  1. I get the following error:
  File "/home/tvm/tvm/python/tvm/relay/quantize/_partition.py", line 136, in add_partition_function
    if 'cuda' in _target.current_target().keys:
AttributeError: 'NoneType' object has no attribute 'keys'
  1. If I remove the problematic code in 1, the accuracy of a quantized model, which was previously working, drops to the point that the output of the model is not valid anymore.

@vinx13 can you have a look into this? I saw that the mentioned commit touches some aspects of accuracy.


@tico I can confirm this issue, cc @ziheng

should be fixed by https://github.com/dmlc/tvm/pull/3792

What’s the model with accuracy drop?

I tested calibration for resnet18 v1, using non-power-2 scale, acc-top1 is 0.51, it raised error when using power-2 scale

resnet18_v1 should be fine with configure here: https://github.com/dmlc/tvm/blob/master/tests/python/nightly/quantization/test_quantization_accuracy.py#L141

Likely there are overflow with int8 addition when custom scales are used
After I commented out


resnet18 v1 works fine (acc 0.69)

@ziheng shall we add an option of whether to use int8 addition to prevent overflow?

The changes made by #3543 will affect the accuracy indeed, but by adjusting the scale, we should achieve matched accuracy. Let me check it.

Hi @vinx13, could you check the accuracy of resnet18_v2 with your change? If it make sense, let’s add an option for whether to use int8 addition.

You can also check some configure I used here: https://github.com/dmlc/tvm/blob/8dab80c86e26d093bc1d10b9e56d9ef9925295c3/tests/python/nightly/quantization/test_quantization_accuracy.py#L166

1 Like

resnet18_v2 0.51 (there are still some accuracy drop, it is around zero before my changes)
resnet50_v2 0.766


@vinx13 @ziheng is there any update on the accuracy issues of the quantization pass?


@tico @ziheng Sorry for the delayed response.

diff --git a/python/tvm/relay/quantize/_partition.py b/python/tvm/relay/quantize/_partition.py
index 1180d836..d794b4e3 100644
--- a/python/tvm/relay/quantize/_partition.py
+++ b/python/tvm/relay/quantize/_partition.py
@@ -85,8 +85,8 @@ def add_partition_generic(ref_call, new_args, ctx):
         #     %10 = add(%9, %meta[relay.Constant])
         #     %11 = add(%3, %10)  <- need to insert annotations for %3, %10
         #     ...
-        lhs = new_args[0].realize()
-        rhs = new_args[1].realize()
+        #lhs = new_args[0].realize()
+        #rhs = new_args[1].realize()
         return _forward_op(ref_call, [lhs, rhs])
     elif not lhs_cond and rhs_cond:
         # - introduced by residual connection in ResNet
@@ -102,7 +102,7 @@ def add_partition_generic(ref_call, new_args, ctx):
         #     %25 = add(%24, %meta[relay.Constant])
         #     %26 = add(%18, %25)  <- need to insert annotations for %25
         #     ...
-        rhs = new_args[1].realize()
+        #rhs = new_args[1].realize()
         return _forward_op(ref_call, [lhs, rhs])
     elif lhs_cond and not rhs_cond:
         if _analysis.check_constant(rhs):
diff --git a/python/tvm/relay/quantize/quantize.py b/python/tvm/relay/quantize/quantize.py
index adde2058..ca267ec3 100644
--- a/python/tvm/relay/quantize/quantize.py
+++ b/python/tvm/relay/quantize/quantize.py
@@ -401,8 +401,8 @@ def prerequisite_optimize(graph, params=None):
         graph = _bind_params(graph, params)
     mod = _module.Module.from_expr(graph)
-    with _transform.PassContext(opt_level=3):
-        mod = optimize(mod)
+    #with _transform.PassContext(opt_level=3):
+    mod = optimize(mod)
     return mod["main"]

This is a temporary fix for the issue.
After applying this patch, accuracy of resent18/resnet50 v1/v2 are normal.

Summary of the issues:

  1. Because of realize call in the partition pass, some of the additions in the quantized model will be int8 addition (instead of casting from int8 to int32 before addition). In this case, if the scale is not carefully chosen, overflow is likely to happen.
  2. prerequisite_optimize uses opt_level=3. In my past experiments I observed some accuracy issue because of FoldScaleAxis. This pass is also applied to the model that is used for calibration. As a result, the collected output from the calibration set might contain more outlier. Simply taking maximum of the output as the scale might not work in this case. (Instead we may want to remove the outlier such as using 99% maximum)
1 Like

Hi @vinx13!

thanks for the update! Let us know once this patch has been merged in the master to give it try.

Hi @vinx13

I was wondering if there is any update on this issue? Has this been solved?