Bringing auto-tuner to tvm


Hi all,

So far TVM is very powerful in generating kernels for different backends. But there are still some drawbacks in TVM stack. For example,

  • Current kernels in TOPI are only optimized for specific shapes and devices
  • Writing high performance schedule is not easy

An auto-tuner can help to solve these problems. Here, we wrote a technical report to show our exploration on bring an autotuner to TVM.
Our auto-tuner is backed by machine learning techniques. It is quite interesting that we use machine learning to optimize machine learning itself.

By using the auto-tuner, we needn’t write and tune many if ... else ... in schedule to set parameters for different shapes. The auto-tuner can also find good kernels for your specific devices.
Furthermore, we can even derive schedule code from tvm.compute directly. Deriving schedule from compute (or auto-schedule) is tested on cuda for some operators. We still need more experiments to improve it.

We are cleaning up the APIs and plan to publish the code in several weeks. Any comment is welcomed.

Armv7-a performance slower than other deepinglearning framework

Do we still need to write WORKLOADS for convolution for specific model(mobilenet/resnet…)? Or auto-tuner can also do this?


Ideally, our auto-tuner can take a NNVM graph as input. It will extract the shape configurations of all tunable operators (conv2d, dense…) and tune them automatically.

It is not implemented but is in our plan. Currently we use a log file to save the parameters for different shape configurations and NNVM will query the parameters from this log file.


I think it will be nicer if we publish include WORKLOADS implementation. Because if we want to use it into production, it will reduce users’ much effort.


Yes it is in our plan, since our system should be ‘end2end’


What’s the latest update of auto-tuner for Mali GPU?

#7 This PR updated gpu backend.
with tutorial