Example of running inference on a int8 quantized model?


#1

Hi all,

I notice that the latest TVM release https://github.com/dmlc/tvm/releases/tag/v0.5 suggests that backend support for int8 quantization has been added, along with a network quantizer for simple networks.

I can’t find any documentation on https://docs.tvm.ai/ related to this, or how to use this.

Where can I find clues on using TVM to run inference on int8 models?

Thanks,
Skanda


#2

You can see the blog https://tvm.ai/2018/12/18/lowprecision-conv.html for lower precision compute.
But it seems this is for very low precision that need quantize aware training. Besides, I found the int8 quantize in relay, I want more details about this too.