Update: I figured out where the massive performance difference comes from. I included the call to executor.evaluate()
when timing. So this issue is solved. Feel free to delete this post. I’d do it myself but I don’t seem to have permission to delete my own topic.
Hello,
I’m trying to use TVM to auto-tune neural networks that I import from .onnx files.
I followed this tutorial on how to load ONNX files and this tutorial on how to extract and tune tasks. I set target='cuda'
, n_trial=2000
and early_stopping=600
just like it is in the tutorial.
I did some benchmarks and noticed that even after 2000 (600) steps of tuning TVM is a lot slower than TensorFlow GPU. The times are also not much better than they are without any auto-tuning at all. Here are some numbers (average time/prediction in ms over 10 runs):
Model TensorFlow GPU TVM (no tuning) TVM (tuning)
AlexNet (CIFAR-10) 4.832 2778.06 1865.881
ResNet50 (CIFAR-10) 6.190 3001.779 2540.002
ResNet50 (ImageNet) 8.779 2832.737 2587.481
WideResNet (CIFAR-10) 4.130 2690.224 2584.722
WideResNet (ImageNet) 6.433 2737.025 2548.646
Does anyone have an idea why TVM is so much slower than TensorFlow and why auto-tuning doesn’t improve the performance very much?