Relay, TVM and running quantized models

protonu · June 11, 2019, 10:37pm

Hi TVM community,

I am looking at how Relay handles quantized models. Are there tutorials and models I can start looking at? Any pointers would be very appreciated :).

Thanks!

tico · June 14, 2019, 1:43pm

If you mean to take a non-quantize model and then apply quantization with TVM you can have a look at the following example:

If you mean taking an already quantized model as an input, as far as I understand this is not yet supported, but the TVM community can correct me if I am wrong.

protonu · June 14, 2019, 4:05pm

Thanks! that is good to know.
I was talking about consuming a quantized model, but knowing this helps!
Is there a similar example for quantizing and running a model on x86?

tico · June 17, 2019, 7:07am

You can check the run_tvm.py script and change the target to llvm and that would compile to your machine by default. Assuming that is x86 then that would do it. However, on my experience the performance would be probably worse than F32.