[Quantization] How to quantize transpose and nn.pad operators?



I am trying to quantize a model which is originally in NHWC, so in order to be able to quantize it I set the target data layout to NCHW. However, as discussed in other threads, change in the data layout implies that transpose operators are added. The problem is that the transpose and also nn.pad operators are added in between the chain of convolutions and since the transpose operator is not quantized in TVM, there are many casting operators from float to int along the chain of convolutions.

What can be done to fix this behavior? How difficult would be to quantize transpose operator?

@vinx13 @ziheng Could you please give me a hint here?



transpose is easy as it only need identity rewrite

pad need a custom rule in realize to pad with quantized-type (i.e. int) value


Hi @vinx13, thanks for the pointers!

I have a couple of further questions:

  1. Could you please give further hints on the pad operator regarding the custom rule?

  2. What about quantizing dense layers? I think I saw some code regarding this. Is this already supported?



BTW, Can reshape be also implemented with identity realize?


For pad, you need to implement PadRealize
Quantizing dense is supported
Reshape can be implemented with identity realize