Relay, TVM and running quantized models

Hi TVM community,

I am looking at how Relay handles quantized models. Are there tutorials and models I can start looking at? Any pointers would be very appreciated :).

Thanks!

1 Like

If you mean to take a non-quantize model and then apply quantization with TVM you can have a look at the following example:

If you mean taking an already quantized model as an input, as far as I understand this is not yet supported, but the TVM community can correct me if I am wrong.

2 Likes

Thanks! that is good to know.
I was talking about consuming a quantized model, but knowing this helps!
Is there a similar example for quantizing and running a model on x86?

You can check the run_tvm.py script and change the target to llvm and that would compile to your machine by default. Assuming that is x86 then that would do it. However, on my experience the performance would be probably worse than F32.