Hi eveyrone,
I would like to know whether int4/int16 quantization was possible using “relay.quantize.quantize
”. So far, I have gone through the documentation in
But I have a few questions:
- What is the difference between
nbit_weight
anddtype_weight
?. I was expecting that the type of the workloads for my tasks would change by only changingnbit_weight
, but I had to also usedtype_weight = "int16"
to achieve that.
The above also applies to “nbit_input” and “dtype_input”
- What are the parameters that you have to modify to have “int16” quantization. So far, my code is:
with relay.quantize.qconfig(calibrate_mode='global_scale', nbit_dict = 16, nbit_weight = 16, dtype_input = "int16", dtype_weight = "int16", global_scale=8.0):
mod = relay.quantize.quantize(mod, params=dict_params)
Would this be enough?
- In the literature, you often find that quantization takes the form:
xint = xfloat/scale + offset
Is there any offset
available in the relay.quantize.qconfig
function?