The calculation of INT8

hhhh · August 6, 2019, 3:32am

How is INT8 computation invoked in TVM?In which part of the code is INT8 evaluated?
for instance:
def bench(name, batch):
sym, data_shape = get_network(name, batch)
data_shape = data_shape[0][1]
sym, params = relay.frontend.from_mxnet(sym, {‘data’: data_shape})
sym, params = tvm.relay.testing.create_workload(sym)
with relay.quantize.qconfig(skip_k_conv=0, round_for_shift=True):
sym = relay.quantize.quantize(sym, params)

with relay.build_module.build_config(opt_level=3):
    graph, lib, params = relay.build(sym, 'cuda', 'llvm', params=params)

m = graph_runtime.create(graph, lib, ctx)
x = np.random.uniform(size=data_shape)
data_tvm = tvm.nd.array(x.astype('float32'))
m.set_input("data", data_tvm)
m.set_input(**{k:tvm.nd.array(v, ctx) for k, v in params.items()})
m.run()
e = m.module.time_evaluator("run", ctx, number=2000, repeat=3)
t = e(data_tvm).results
t = np.array(t) * 1000

In this code, I don’t know whether INT8 calculation is defined in the quantization process or through module

thierry · August 6, 2019, 10:45pm

relay.quantize.qconfig(skip_k_conv=0, round_for_shift=True) will apply the quantization to the Relay program. This will affect the operators that the network uses.

relay.build builds your workload into a module m that you can execute when you invoke m.run()