TVM model warmup need much more time than mxnet



Modifing and running tvm/app/benchmark/ and tutorial/ in loop of 1000 times for testing ResNet-18 speedup.

The average time without first several loop looks good, while the first several trials have real high time cost.


i7 + 1080Ti
tvm with CUDA + cudnn + cublas
CUDA version: 8.0



benchmark: 1.39 ms
from_mxnet: 1.4 ms
mxnet 1.4 + cudnn: 10.49 ms

first two loop

from_mxnet: 7.53 sec, 18.7 ms
mxnet 1.4 + cudnn: 0.097s, 12 ms


We can see that the first two warm up loop in tvm really need long time, while the mxnet looks ok. Is this normal? Or how can I optimize this part? Or is there any place in TVM to optimize? Thanks a lot!