Can not compile quantized TFLite model with INT32 bias


#1

Quantized TFLite conv2d operator has 3 inputs
input_tensor - UINT8
weight_tensor - UINT8
bias_tensor - INT32

Model visualization

The model: ssd_mobilenet_v1_quantized_300x300_nopp.tflite

tflite.py knows that bias type should be INT32 - see the comment in tflite.py line 526

However, when I tried to compile the model for x86 llvm I got an error - Unable to unify parent types uint8 and int32 in nn.bias_add

fn (%normalized_input_image_tensor: Tensor[(1, 300, 300, 3), uint8]) {
  %0 = nn.pad(%normalized_input_image_tensor, pad_width=[[0, 0], [0, 1], [0, 1], [0, 0]])
  %1 = nn.conv2d(%0, meta[relay.Constant][0], strides=[2, 2], channels=32, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWIO")
  %2 = nn.bias_add(%1, meta[relay.Constant][1], axis=3)an internal invariant was violated while typechecking your program [03:54:18] /root/tvm/src/relay/pass/type_solver.cc:119: Check failed: resolved.defined(): Unable to unify parent types: TensorType([32], uint8) and TensorType([32], int32)
; an internal invariant was violated while typechecking your program [03:54:18] /root/tvm/src/relay/pass/type_solver.cc:119: Check failed: resolved.defined(): Unable to unify parent types: TensorType([32], uint8) and TensorType([32], int32)
; 

I checked bias values returned by get_tensor_value(bias_tensor)
The values are bigger that 255 (and some values are negative), so yes they need to be stored as INT32 in tflite file.

4 DEPTHWISE_CONV_2D
int32 [  981  -424  1536   -52  1069  2496   286   150   190   593   765   110
    88    34     8    52   844  1536    63    13    61   421   139   138
  -744  1955   121  1238   200   -22   -55    -5  1646   774    59   205
    81    50   529    31    24   131  1771   872    55   812  -591   934
  1008   829    17   690   177  2388    -3   -17  -188  -292  2052    15
   756   780  -181   141  1425   930  1548  2210   -72  2303   -45   946
  1977  1026  -140  -121   816   298    57   655  -326   873 -1895    37
  1265   -31    15   787  1599    11 -2487  -352  1027   -28   -25   660
    74  1716    70  -316   708  2455  1486   835  1110   -89   598   657
   219   -13  1137   -12   799    27    97   689     5   988   664  1018
  2300  1204  1122     4   133    34  -410  -843]
3 CONV_2D
int32 [ 3114  1766  2686  -274  3200  1228   561 -4822  1530  2507  3195 -2809
 -2841  2425  -587  1246   931    50  2384   196 -1037  3485   681  2363
  2286  2213   312  1083  1433  -804  1712  1220  4018   443  1986   639
  1779  4287  2746  1970  1580   811  2888   570   624  -135  2562   815
  1047   964  -937  3350  2584   757 -4328  -272   470   959   551  2961
  3047  1341  1390   858 -1372  1213  2441  1600  2009  3565  2640  5641
   687  2651  -880 -2308   847  2818   -59   -84  -184  1461  -213  1876
   527  2070  1829  4862  1479   566  1298   289   789  -541  1980  3745
  1049  2330 -3020  -639   987   169   869  2092  1503  2488   741  3914
  -636  3055   991 -1020 -2638   924   832  2447  1818   850   663  1683
 -2004  2613   -75 -2852  -947  1219  2165  1074  1597  2022  -490  -334
  4397  -756   679  2508  1860  -687  -167  3348 -1132  1637 -1748   919
  2519  1829  1718  5223  -882  1220  4478  3437  1097 -3872  2730  1573
  1058  1466  -683  1495  -701  2616   550 -2124  2402   195  1947  1953
  4767  3350  2414     1  2355   850  2764  1048  2405  2351  2043  -490
  2348 -2791  -647  1786   269  3793  2343 -2347  3976   529   622  3193
  1144   914  2619 -3740  -784  1521  3068   777 -1482  1070  1524 -2071
  -693  2138   153  -452  1339   369 -1000   342  1942   408  2532  7974
   952 -2546  1043  2284 -1173  2370  -307  -664  -697  4653  1374   364
  2083  2047  1301  3512 -2190  1016  2116  2289  1023  1527  2633   746
  -737   665  -109  1225  2594  1558  2165  -489  2551  6170  1869  4141
  2847  5387  3979 -1645]

script to compile the model:

import tflite.Model
from tvm import relay

tflite_model_file = "ssd_mobilenet_v1_quantized_300x300_nopp.tflite"
tflite_model_buf = open(tflite_model_file, "rb").read()
tflite_model = tflite.Model.Model.GetRootAsModel(tflite_model_buf, 0)
input_tensor = "normalized_input_image_tensor"
input_shape = (1, 300, 300, 3)
input_dtype = "uint8"
func, params = relay.frontend.from_tflite(tflite_model,
                                          shape_dict={input_tensor: input_shape},
                                          dtype_dict={input_tensor: input_dtype})

with relay.build_config(opt_level=3):
    graph, lib, params = relay.build(func, 'llvm', params=params)

#2

@FrozenGene Could you comment on the issue?
Should we cast UINT8 tensors to INT32?
Maybe we can change TypeSolver to unify UINT8 and INT32 to INT32


#3

Hi @apivovarov, the quantized TFLite model intake is not implemented yet. This is not just TypeSolver problem.

This requires creating new qunatized ops for each operator and defining their compute and schedules. For example, quantized conv2d computation is different from FP32 conv2d implementation (not just changing the dtype from FP32 to INT8).

You can find more details at https://github.com/dmlc/tvm/issues/2351


#4

Hi,

I am facing a similar issue

unable to unify: `Tensor[(256), uint8] and Tensor[(256), int32] `

This happens in the nn.bias_add() operator.

@FrozenGene Does TVM already takes in quantized models in TFLite?

Thanks


#5

We are working in progress. @janimesh has implemented some stuff.