[Quantization] Problems with recent refactoring changes in the quantization pass

tico · August 16, 2019, 11:31am

Hi,

After the last changes in the quantization in the following commit, I am facing some issues:

I get the following error:

  File "/home/tvm/tvm/python/tvm/relay/quantize/_partition.py", line 136, in add_partition_function
    if 'cuda' in _target.current_target().keys:
AttributeError: 'NoneType' object has no attribute 'keys'

If I remove the problematic code in 1, the accuracy of a quantized model, which was previously working, drops to the point that the output of the model is not valid anymore.

@vinx13 can you have a look into this? I saw that the mentioned commit touches some aspects of accuracy.

Thanks

vinx13 · August 16, 2019, 5:18pm

@tico I can confirm this issue, cc @ziheng

ziheng · August 16, 2019, 5:37pm

should be fixed by https://github.com/dmlc/tvm/pull/3792

ziheng · August 16, 2019, 5:38pm

What’s the model with accuracy drop?

vinx13 · August 16, 2019, 5:57pm

I tested calibration for resnet18 v1, using non-power-2 scale, acc-top1 is 0.51, it raised error when using power-2 scale

ziheng · August 16, 2019, 8:15pm

resnet18_v1 should be fine with configure here: https://github.com/dmlc/tvm/blob/master/tests/python/nightly/quantization/test_quantization_accuracy.py#L141

vinx13 · August 16, 2019, 8:56pm

Likely there are overflow with int8 addition when custom scales are used
After I commented out

and

github.com

dmlc/tvm/blob/master/python/tvm/relay/quantize/_partition.py#L105


    #     %14 = add(%13, %meta[relay.Constant])
    #     %15 = annotation.cast_hint(%15, 'int8')
    #     %16 = annotation.stop_fusion(%16)
    #     %17 = add(%5, %16)
    #     %18 = nn.relu(%17)
    #     ...
    #     %24 = nn.conv2d(%23, %meta[relay.Constant])
    #     %25 = add(%24, %meta[relay.Constant])
    #     %26 = add(%18, %25)  <- need to insert annotations for %25
    #     ...
    rhs = new_args[1].realize()
    return _forward_op(ref_call, [lhs, rhs])
elif lhs_cond and not rhs_cond:
    if _analysis.check_constant(rhs):
        # - introduced by batch_norm: add(out, bias)
        return QPartitionExpr(_forward_op(ref_call, [lhs, rhs]))
    # - introduced by residual connection in MobileNetV2
    #     ...
    #     %81 = add(%80, meta[relay.Constant])
    #     %82 = annotation.cast_hint(%81, 'int8')
    #     %83 = annotation.stop_fusion(%82)

resnet18 v1 works fine (acc 0.69)

vinx13 · August 22, 2019, 3:21am

@ziheng shall we add an option of whether to use int8 addition to prevent overflow?

ziheng · August 23, 2019, 6:24am

The changes made by #3543 will affect the accuracy indeed, but by adjusting the scale, we should achieve matched accuracy. Let me check it.

ziheng · August 27, 2019, 8:27am

Hi @vinx13, could you check the accuracy of resnet18_v2 with your change? If it make sense, let’s add an option for whether to use int8 addition.

You can also check some configure I used here: https://github.com/dmlc/tvm/blob/8dab80c86e26d093bc1d10b9e56d9ef9925295c3/tests/python/nightly/quantization/test_quantization_accuracy.py#L166

vinx13 · August 28, 2019, 1:25am

resnet18_v2 0.51 (there are still some accuracy drop, it is around zero before my changes)
resnet50_v2 0.766

tico · September 18, 2019, 12:25pm

Hi,

@vinx13 @ziheng is there any update on the accuracy issues of the quantization pass?

Thanks

vinx13 · September 29, 2019, 4:34am

@tico @ziheng Sorry for the delayed response.

diff --git a/python/tvm/relay/quantize/_partition.py b/python/tvm/relay/quantize/_partition.py
index 1180d836..d794b4e3 100644
--- a/python/tvm/relay/quantize/_partition.py
+++ b/python/tvm/relay/quantize/_partition.py
@@ -85,8 +85,8 @@ def add_partition_generic(ref_call, new_args, ctx):
         #     %10 = add(%9, %meta[relay.Constant])
         #     %11 = add(%3, %10)  <- need to insert annotations for %3, %10
         #     ...
-        lhs = new_args[0].realize()
-        rhs = new_args[1].realize()
+        #lhs = new_args[0].realize()
+        #rhs = new_args[1].realize()
         return _forward_op(ref_call, [lhs, rhs])
     elif not lhs_cond and rhs_cond:
         # - introduced by residual connection in ResNet
@@ -102,7 +102,7 @@ def add_partition_generic(ref_call, new_args, ctx):
         #     %25 = add(%24, %meta[relay.Constant])
         #     %26 = add(%18, %25)  <- need to insert annotations for %25
         #     ...
-        rhs = new_args[1].realize()
+        #rhs = new_args[1].realize()
         return _forward_op(ref_call, [lhs, rhs])
     elif lhs_cond and not rhs_cond:
         if _analysis.check_constant(rhs):

diff --git a/python/tvm/relay/quantize/quantize.py b/python/tvm/relay/quantize/quantize.py
index adde2058..ca267ec3 100644
--- a/python/tvm/relay/quantize/quantize.py
+++ b/python/tvm/relay/quantize/quantize.py
@@ -401,8 +401,8 @@ def prerequisite_optimize(graph, params=None):
         graph = _bind_params(graph, params)
 
     mod = _module.Module.from_expr(graph)
-    with _transform.PassContext(opt_level=3):
-        mod = optimize(mod)
+    #with _transform.PassContext(opt_level=3):
+    mod = optimize(mod)
     return mod["main"]

This is a temporary fix for the issue.
After applying this patch, accuracy of resent18/resnet50 v1/v2 are normal.

Summary of the issues:

Because of realize call in the partition pass, some of the additions in the quantized model will be int8 addition (instead of casting from int8 to int32 before addition). In this case, if the scale is not carefully chosen, overflow is likely to happen.
prerequisite_optimize uses opt_level=3. In my past experiments I observed some accuracy issue because of FoldScaleAxis. This pass is also applied to the model that is used for calibration. As a result, the collected output from the calibration set might contain more outlier. Simply taking maximum of the output as the scale might not work in this case. (Instead we may want to remove the outlier such as using 99% maximum)

tico · October 2, 2019, 7:27am

Hi @vinx13!

thanks for the update! Let us know once this patch has been merged in the master to give it try.

tico · October 28, 2019, 9:24am

Hi @vinx13

I was wondering if there is any update on this issue? Has this been solved?

Thanks!