Issue with GPU Autotune script | Cpu working fine


I am following this tutorial:

I am having trouble implementing the get_network method.
For CPU i am using the following to load from mxnet but for GPU i am confused.

def get_network(name, batch_size):
prefix,epoch = “/opt/models/model”,0
sym, arg_params, aux_params = mx.model.load_checkpoint(prefix, epoch)
opt_level = 3
nnvm_sym, nnvm_params = nnvm.frontend.from_mxnet(sym, arg_params, aux_params)
input_shape = (batch_size, 3, 224, 224)
output_shape = (batch_size, 1024)
return nnvm_sym, nnvm_params, input_shape, output_shape

What is similar code for GPU ?


Maybe you can try to load the model from Relay? Following this tutorial that can easily specify the target for your tuning tasks.


Thanks for replying.

I was able to run the Autotuning but why is it that auto tuning models are faster on CPU than GPU ?
For the same model;

XGB Tuner (CPU): 3.5ms
XGB Tuner on (GPU) 1080TI : 19.5ms

Is the GPU tuning not optimized yet or am i missing something ?


It’s possible if you set the same tuning time for both platforms, as the tuning space for GPU is usually much larger than CPU. If you want, you could dive into the scheduling and tuning space implementation in TOPI and refine the tuning space to make it more efficient. For example, here is the conv2d implementation in TOPI.

In addition, another straightforward solution to achieve high performance on GPU is enabling cuDNN. To do so, first make sure you set USE_CUDNN to ON when building TVM, and then specify the target as “cuda -libs=cudnn”. The detail can be found here.


CUDNN is already enabled in TVM.
I am just optimizing resnet 100.

How fast would the GPU be ? Would it be able to beat CPU numbers by a huge margin ?