How could I accelerate model inference?

During compiling, I got the warnings like this:

WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 512, 96, 96, 'float32'), (512, 512, 1, 1, 'float32'), (2, 2), (0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.

And the inference is slower than onnxruntime, my code is like this:

import onnx
import numpy as np
import tvm
import tvm.relay as relay
import cv2
import time

input_shape = (1, 3, 768, 768)
shape_dict = {'0': input_shape}
impth = 'segm_pic.png'
mdpth = './tmp/model_final_ema.onnx'
opt_level = 3
target = 'cuda'
ctx = tvm.gpu()

im = cv2.imread(impth)
im = cv2.resize(im, (input_shape[2:][::-1]))
im = (im - np.array([123, 117, 104])) / np.array([58.4, 57.1, 57.4])
im = im.transpose((2, 0, 1))[np.newaxis, :].astype('float32')

model = onnx.load(mdpth)
mod, params = relay.frontend.from_onnx(model, shape_dict, dtype='float32', opset=11)
with relay.build_config(opt_level=opt_level):
    intrp = relay.build_module.create_executor('graph', mod, ctx, target)
in_im = tvm.nd.array(im)
t1 = time.time()
for i in range(100):
    out = intrp.evaluate()(in_im, **params)[0].asnumpy()
t2 = time.time()

import onnxruntime as ort
import onnxruntime.backend as backend
model = onnx.load(mdpth)
sess = backend.prepare(model, device='GPU')
t3 = time.time()
for i in range(100):
    out_rt =, im, device='GPU')[0]
t4 = time.time()
print(t2 - t1)
print(t4 - t3)

The result shows that tvm is even slower than onnxruntime. Is this the truth, or what is the correct way to use tvm on this?

The warning you see says that TVM is using the default configuration. You should use AutoTVM to find the best configuration for Conv2D on your hardware.

@jonso Hi, my onnx model contains conv2d, depthwiseconv2d, and operators such as reduce_mean operator and depth_to_space. I found that when I only tune conv2d, I will not receive the warning anymore, even though I did not add reduce mean operator in the tuning settings. Does this mean that I will not need to tune these operators and the configuration is optimal since I do not receive warnings?

Yes you only need to tune conv2d. Other ops do not have AutoTVM tuning templates so they cannot be tuned.

Got it, thanks for replying !!