I would like to deploy a tensorflow model in x86 and also use quantization. First, after doing some preliminary benchmarking I noticed that the NHWC provides better performance than NCHW. However, for quantization NCHW is required, but this results in a performance degradation. Moreover, I noticed that due to the use of the NCHW layout, lots of transpose ops are added and this might be the reason of the performance degradation.
When I tune with AutoTVM, the performance of the model gets better close but still worse than the FP32 performance.
I would appreciate if someone could clarify me this situation.