Question about "qconfig" options for quantization

Hi eveyrone,

I would like to know whether int4/int16 quantization was possible using “relay.quantize.quantize”. So far, I have gone through the documentation in

But I have a few questions:

  1. What is the difference between nbit_weight and dtype_weight?. I was expecting that the type of the workloads for my tasks would change by only changing nbit_weight, but I had to also use dtype_weight = "int16" to achieve that.

The above also applies to “nbit_input” and “dtype_input”

  1. What are the parameters that you have to modify to have “int16” quantization. So far, my code is:

with relay.quantize.qconfig(calibrate_mode='global_scale', nbit_dict = 16, nbit_weight = 16, dtype_input = "int16", dtype_weight = "int16", global_scale=8.0): mod = relay.quantize.quantize(mod, params=dict_params)

Would this be enough?

  1. In the literature, you often find that quantization takes the form:

xint = xfloat/scale + offset

Is there any offset available in the relay.quantize.qconfig function?