[Quantization] Understanding the quantization passes

kloud1989 · November 1, 2019, 2:25am

The quantization pass is some kind difficult for me to understand. Is there anyone who can help to give an illustration about the pass sequence that quantization is going through?

xwang186 · November 4, 2019, 6:57pm

Don’t know if you are interested in backend C++ implementation or just how the process works.
The basic logic is as follow:

From begin, no pass is quantized yet;
Go through the layers:

Determine if the current layer/operator supports quantize (in realize.cc).
if supports
if quantization started: quantize the current layer(_annotation.py);
If not: start quantize, quantize the current layer;
if not
if quantization started: De-quantization before the current layer, stop quantization
If not start quantize: do nothing

kloud1989 · November 6, 2019, 8:36am

Thanks for your reply!
In fact I’m interested in the implementation logic of the quantization pass sequence: partition ==> annotation ==> calibration ==> realization. Without a big picture of the logic, it’s kind of difficult to read the source code. There is a RFC in the TVM issue list (https://github.com/dmlc/tvm/issues/2259) talking about the workflow. I’ll check this for help.