Compile official mobilenet onnx, get a very slow performance

Hi,
I use ONNX official mobilenetv2 model([https://s3.amazonaws.com/onnx-model-zoo/mobilenet/mobilenetv2-1.0/mobilenetv2-1.0.onnx](mobilenetv2 onnx)), and follow the tutorialshttps://docs.tvm.ai/tutorials/frontend/from_onnx.html. But I get a very slow performance, each inference cost 8.9s on Nvidia P40.
when I compile the model, I got lots of WARNING like “WARNING:autotvm:Cannot find config for target=cuda -model=unknown, workload=(‘conv2d’, (1, 96, 56, 56, ‘float32’), (24, 96, 1, 1, ‘float32’), (1, 1), (0, 0), (1, 1), ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=cuda -model=unknown, workload=(‘depthwise_conv2d_nchw’, (1, 96, 112, 112, ‘float32’), (96, 1, 3, 3, ‘float32’), (2, 2), (1, 1), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression.”

does anyone meet this?

the warning because of the autotvm,you can ignore the warnings. it just gives some tips for you ,using autotvm ,you can get better performance

I found that whatever I set the target to ‘llvm’ or ‘cuda’, the inference time is the same. Does anyone meet this situation?

you can try set target = ‘cuda -libs=cudnn’ , it is very fast

still need 0.027s, If I don’t add ’ -libs=cudnn’, it cost me 0.04s,If I use ‘llvm’, it cost 0.049, I don’t know why

I found if I use ‘graph_runtime.create()’ to build the model instead of using
‘create_executor’, the speed is very fast.

It would be indeed interesting that someone clarify the difference between graph_runtime.create() and create_executor() in terms of performance

Here there is some discussion about differences between build (graph_runtime.create) and create_executor. There is no explanation why one could be faster than the other one.