How could I accelerate model inference?

coincheung · November 20, 2019, 5:33am

During compiling, I got the warnings like this:

WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 512, 96, 96, 'float32'), (512, 512, 1, 1, 'float32'), (2, 2), (0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.

And the inference is slower than onnxruntime, my code is like this:

import onnx
import numpy as np
import tvm
import tvm.relay as relay
import cv2
import time

input_shape = (1, 3, 768, 768)
shape_dict = {'0': input_shape}
impth = 'segm_pic.png'
mdpth = './tmp/model_final_ema.onnx'
opt_level = 3
target = 'cuda'
ctx = tvm.gpu()

im = cv2.imread(impth)
im = cv2.resize(im, (input_shape[2:][::-1]))
im = (im - np.array([123, 117, 104])) / np.array([58.4, 57.1, 57.4])
im = im.transpose((2, 0, 1))[np.newaxis, :].astype('float32')

model = onnx.load(mdpth)
mod, params = relay.frontend.from_onnx(model, shape_dict, dtype='float32', opset=11)
with relay.build_config(opt_level=opt_level):
    intrp = relay.build_module.create_executor('graph', mod, ctx, target)
in_im = tvm.nd.array(im)
t1 = time.time()
for i in range(100):
    out = intrp.evaluate()(in_im, **params)[0].asnumpy()
t2 = time.time()

import onnxruntime as ort
import onnxruntime.backend as backend
model = onnx.load(mdpth)
sess = backend.prepare(model, device='GPU')
t3 = time.time()
for i in range(100):
    out_rt = backend.run(sess, im, device='GPU')[0]
t4 = time.time()
print(t2 - t1)
print(t4 - t3)

The result shows that tvm is even slower than onnxruntime. Is this the truth, or what is the correct way to use tvm on this?

jonso · November 20, 2019, 4:26pm

The warning you see says that TVM is using the default configuration. You should use AutoTVM to find the best configuration for Conv2D on your hardware.

coincheung · January 21, 2020, 5:36am

@jonso Hi, my onnx model contains conv2d, depthwiseconv2d, and operators such as reduce_mean operator and depth_to_space. I found that when I only tune conv2d, I will not receive the warning anymore, even though I did not add reduce mean operator in the tuning settings. Does this mean that I will not need to tune these operators and the configuration is optimal since I do not receive warnings?

comaniac · January 21, 2020, 5:58am

Yes you only need to tune conv2d. Other ops do not have AutoTVM tuning templates so they cannot be tuned.

coincheung · January 21, 2020, 7:42am

Got it, thanks for replying !!

wxyhv · April 2, 2021, 9:00am

Hello~ Could you tell where is “AutoTVM tuning templates”? And how to add AutoTVM tuning templates, and let AutoTVM support other ops which AutoTVM can’t tune?