I am trying to quantize a model which is originally in NHWC, so in order to be able to quantize it I set the target data layout to NCHW. However, as discussed in other threads, change in the data layout implies that transpose operators are added. The problem is that the transpose and also nn.pad operators are added in between the chain of convolutions and since the transpose operator is not quantized in TVM, there are many casting operators from float to int along the chain of convolutions.
What can be done to fix this behavior? How difficult would be to quantize transpose operator?