[QNN] Channel wise quantization - Quantize and Requantize

Background

As of now, QNN supports per-tensor quantization, i.e., there is a single scale and zero point for the whole tensor. Prior research has shown that per-channel (aka per-axis) quantization can lead to better accuracy while having minimal impact on performance. In channel quantization, there is scale and zero point for each channel in the tensor.

Channel quantization in frameworks

Both, MxNet-MKLDNN and Tensorflow support channel wise quantization. However, the channel wise quantization support is limited to only few operators and also under some restrictions. These restrictions ensure that the performance degradation is not severe. For example, only weights are considered for channel quantization. Activations/intermediate features maps are still per-tensor quantized. Additionally, whenever a weight tensor is channel quantized, the zero point is 0 for the whole tensor (only scales are per-axis). These restrictions are true for both the frameworks. More details can be found at

QNN operators

The above restrictions nicely translate to changes in only 2 QNN operators - quantize and requantize. Both operators take the scale as input expr. The lowering can check whether the input_scale is a vector, and do the channel-wise lowering. Otherwise, we fall back to per-tensor lowering. We also need an axis argument to tell along which axis the tensor is quantized.

There are not too many changes to support channelwise quantization. This RFC is to get feedback from the community.

@jackwish @FrozenGene @vinx13 @yzhliu @shoubhik @ramana-arm

PR - https://github.com/apache/incubator-tvm/pull/4629

7 Likes

Thank you for the RFC @janimesh! I have read the code, guess that it could be better if we can make it clear that it is enabling per-channel quantization for quantize/requantize :slight_smile:

Thanks @jackwish I did it in the RFC and PR heading.

As far as code goes, I have some comments for quantize and requantize that might be just enough for now. For deeper investigation, an avid reader should be able to find this RFC from the pull request.