Hi,
I am trying to quantize and tune some TF models on x86. However, the performance results are extremely poor compare with the non-quantize version. The numbers are as follows:
-
First model
TVM FP32: 35.05ms
TVM int8 quantization: 80.ms
TVM int8 quantization + AutoTVM: 46.87ms -
Second model
TVM FP32: 72.85ms
TVM int8 quantization: 159.33ms
TVM int8 quantization + AutoTVM: 112.39ms
What is the reason for such a bad performance? What can be done to try to improve performance?
@vinx13 Any ideas?