Compile official mobilenet onnx, get a very slow performance


#1

Hi,
I use ONNX official mobilenetv2 model([https://s3.amazonaws.com/onnx-model-zoo/mobilenet/mobilenetv2-1.0/mobilenetv2-1.0.onnx](mobilenetv2 onnx)), and follow the tutorialshttps://docs.tvm.ai/tutorials/frontend/from_onnx.html. But I get a very slow performance, each inference cost 8.9s on Nvidia P40.
when I compile the model, I got lots of WARNING like “WARNING:autotvm:Cannot find config for target=cuda -model=unknown, workload=(‘conv2d’, (1, 96, 56, 56, ‘float32’), (24, 96, 1, 1, ‘float32’), (1, 1), (0, 0), (1, 1), ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=cuda -model=unknown, workload=(‘depthwise_conv2d_nchw’, (1, 96, 112, 112, ‘float32’), (96, 1, 3, 3, ‘float32’), (2, 2), (1, 1), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression.”

does anyone meet this?


#2

the warning because of the autotvm,you can ignore the warnings. it just gives some tips for you ,using autotvm ,you can get better performance


#3

I found that whatever I set the target to ‘llvm’ or ‘cuda’, the inference time is the same. Does anyone meet this situation?


#4

you can try set target = ‘cuda -libs=cudnn’ , it is very fast


#5

still need 0.027s, If I don’t add ’ -libs=cudnn’, it cost me 0.04s,If I use ‘llvm’, it cost 0.049, I don’t know why


#6

I found if I use ‘graph_runtime.create()’ to build the model instead of using
‘create_executor’, the speed is very fast.


#7

It would be indeed interesting that someone clarify the difference between graph_runtime.create() and create_executor() in terms of performance


#8

Here there is some discussion about differences between build (graph_runtime.create) and create_executor. There is no explanation why one could be faster than the other one.