Quantization on x86 with AVX is slower than without quantization

tico · July 16, 2019, 1:33pm

Hi,

I have been testing quantization + AVX2 with the tensorflow inception model from the tutorial (https://docs.tvm.ai/tutorials/frontend/from_tensorflow.html) on a x86 machine.

I found out that with quantization the performance is worse. Without quantization I get an inference time of 71.09ms, but with quantization the inference increases to 166.18ms.

Why is this the case? Does TVM properly support quantization and AVX2 on x86?

Thanks