@tico @ziheng Sorry for the delayed response.
diff --git a/python/tvm/relay/quantize/_partition.py b/python/tvm/relay/quantize/_partition.py
index 1180d836..d794b4e3 100644
--- a/python/tvm/relay/quantize/_partition.py
+++ b/python/tvm/relay/quantize/_partition.py
@@ -85,8 +85,8 @@ def add_partition_generic(ref_call, new_args, ctx):
# %10 = add(%9, %meta[relay.Constant])
# %11 = add(%3, %10) <- need to insert annotations for %3, %10
# ...
- lhs = new_args[0].realize()
- rhs = new_args[1].realize()
+ #lhs = new_args[0].realize()
+ #rhs = new_args[1].realize()
return _forward_op(ref_call, [lhs, rhs])
elif not lhs_cond and rhs_cond:
# - introduced by residual connection in ResNet
@@ -102,7 +102,7 @@ def add_partition_generic(ref_call, new_args, ctx):
# %25 = add(%24, %meta[relay.Constant])
# %26 = add(%18, %25) <- need to insert annotations for %25
# ...
- rhs = new_args[1].realize()
+ #rhs = new_args[1].realize()
return _forward_op(ref_call, [lhs, rhs])
elif lhs_cond and not rhs_cond:
if _analysis.check_constant(rhs):
diff --git a/python/tvm/relay/quantize/quantize.py b/python/tvm/relay/quantize/quantize.py
index adde2058..ca267ec3 100644
--- a/python/tvm/relay/quantize/quantize.py
+++ b/python/tvm/relay/quantize/quantize.py
@@ -401,8 +401,8 @@ def prerequisite_optimize(graph, params=None):
graph = _bind_params(graph, params)
mod = _module.Module.from_expr(graph)
- with _transform.PassContext(opt_level=3):
- mod = optimize(mod)
+ #with _transform.PassContext(opt_level=3):
+ mod = optimize(mod)
return mod["main"]
This is a temporary fix for the issue.
After applying this patch, accuracy of resent18/resnet50 v1/v2 are normal.
Summary of the issues:
- Because of
realize
call in the partition pass, some of the additions in the quantized model will be int8 addition (instead of casting from int8 to int32 before addition). In this case, if the scale is not carefully chosen, overflow is likely to happen.
- prerequisite_optimize uses opt_level=3. In my past experiments I observed some accuracy issue because of FoldScaleAxis. This pass is also applied to the model that is used for calibration. As a result, the collected output from the calibration set might contain more outlier. Simply taking maximum of the output as the scale might not work in this case. (Instead we may want to remove the outlier such as using 99% maximum)