Tutorials on data-aware quantization calibration?

Will TVM’s current quantization support the following situation? If so, where can I find more info?

  • Input model is float32
  • All weights must be transformed into int8 fixed point format
  • Fixed point parameters can be shared within a tensor, but do not need to be the same from tensor-to-tensor
  • Data-aware calibration needs to find min/max values of accumulators and pass this info from relay down to code generation (what is the mechanism here?)

I’m working on the tutorial and it will be available in one or two weeks. For your questions, 1st and 2nd are supported. Fixed point parameters are shared within a tensor (we use shared parameter + int tensor representation).

2 Likes

@vinx13 Great! Thanks for taking the time to write a tutorial on this.

Hi @vinx13, what is the status of the quantize tutorial?