There are some issues in quantization due to changes in #3135.
This assertion seems not necessary.
A use case is that, lhs is quantized (int8), and rhs is constant.
For example, in resent we have add(max_pool(…), bias). Since max_pool is quantized, lhs of add is int8. In this case, during realize stage we need to convert both lhs and rhs to int32 and then perform the addition.
After removing this assertion we can run the annotate pass successfully. But other changes are still needed in realize.
In UnifyDTypeScale:
Conditions here do not cover every case. In the example above, lhs is int8 and rhs is float.
Thank you for the catch @zhiics. I believe this stresses the importance of extended unit test coverage for quantization passes. Do you want to go ahead and issue a PR fix? If VTA quantization is broken, it will be caught when we build the sphinx gallery.
The use case is add(conv, conv), where rhs is a shortcut. These lines transform rhs to cast(i32, cast(i8, qconv)). I found VTA is doing similar thing using ForceCast, maybe we can unify this? @ziheng@thierry
Unifying makes sense; this would mean that we wouldn’t need to rely on ForceCast anymore for VTA quantization? It makes sense to keep annotations limited.
Yes we can have a unified way to do this.
In add_vta_rewrite realize is explicitly called which produces ForceCast + StopFusion. This is the same as what we did in UnifyDTypeScale that used simulated_quantize annotation.