The quantization pass is some kind difficult for me to understand. Is there anyone who can help to give an illustration about the pass sequence that quantization is going through?
Don’t know if you are interested in backend C++ implementation or just how the process works.
The basic logic is as follow:
From begin, no pass is quantized yet;
Go through the layers:
- Determine if the current layer/operator supports quantize (in realize.cc).
if quantization started: quantize the current layer(_annotation.py);
If not: start quantize, quantize the current layer;
if quantization started: De-quantization before the current layer, stop quantization
If not start quantize: do nothing
Thanks for your reply!
In fact I’m interested in the implementation logic of the quantization pass sequence: partition ==> annotation ==> calibration ==> realization. Without a big picture of the logic, it’s kind of difficult to read the source code. There is a RFC in the TVM issue list (https://github.com/dmlc/tvm/issues/2259) talking about the workflow. I’ll check this for help.