Tutorials on data-aware quantization calibration?

adb · November 19, 2019, 6:42pm

Will TVM’s current quantization support the following situation? If so, where can I find more info?

Input model is float32
All weights must be transformed into int8 fixed point format
Fixed point parameters can be shared within a tensor, but do not need to be the same from tensor-to-tensor
Data-aware calibration needs to find min/max values of accumulators and pass this info from relay down to code generation (what is the mechanism here?)

vinx13 · November 19, 2019, 6:45pm

I’m working on the tutorial and it will be available in one or two weeks. For your questions, 1st and 2nd are supported. Fixed point parameters are shared within a tensor (we use shared parameter + int tensor representation).

adb · November 19, 2019, 6:50pm

@vinx13 Great! Thanks for taking the time to write a tutorial on this.

adb · December 9, 2019, 10:47pm

Hi @vinx13, what is the status of the quantize tutorial?