As demonstrated, AutoTVM has big performance improvement and is one killer feature in TVM. However, AutoTVM will introduce one disadvantage, i.e. its tuning time is very long, especially on GPU / remote embedded devices.
If we have one prepared model, it is somehow can be accepted. However, if we want to evaluate the performance quickly, it can not satisfy. Let us imagine we want to prune / compress one model and see which one is better, we can not get quick response run time like other inference framework (like tflite), because we need tuning, maybe we need hours.
I want to introduce this thread to discuss, how could we improve the experience when AutoTVM’s tuning time is so long. Sometimes it even will limit that we do the experiment described before.