Question about "qconfig" options for quantization

sergiomatiz · May 6, 2020, 4:00pm

Hi eveyrone,

I would like to know whether int4/int16 quantization was possible using “relay.quantize.quantize”. So far, I have gone through the documentation in

github.com

apache/incubator-tvm/blob/master/python/tvm/relay/quantize/quantize.py

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
#pylint: disable=unused-argument, not-context-manager
"""Automatic quantization toolkit."""
import tvm.ir
import tvm

This file has been truncated. show original

But I have a few questions:

What is the difference between nbit_weight and dtype_weight?. I was expecting that the type of the workloads for my tasks would change by only changing nbit_weight, but I had to also use dtype_weight = "int16" to achieve that.

The above also applies to “nbit_input” and “dtype_input”

What are the parameters that you have to modify to have “int16” quantization. So far, my code is:

with relay.quantize.qconfig(calibrate_mode='global_scale', nbit_dict = 16, nbit_weight = 16, dtype_input = "int16", dtype_weight = "int16", global_scale=8.0): mod = relay.quantize.quantize(mod, params=dict_params)

Would this be enough?

In the literature, you often find that quantization takes the form:

xint = xfloat/scale + offset

Is there any offset available in the relay.quantize.qconfig function?