Autotune not working with x86 AVX2 architecture

Compute Specifications

  1. Ubuntu 16.04
  2. Intel® Xeon® CPU E3-1275 v6 @ 3.80GHz
  3. llvm - 9.0.1
  4. tvm - ‘0.7.dev0’

Issue Description

I am trying to run the script tune_relay_x86.py with all the default configurations except target = "llvm -mcpu=core-avx2". However, I am still observing the following warnings for all convolution and dense layers.

Cannot find config for target=llvm -device=tracing, workload=('conv2d', (1, 128, 28, 28, 'float32'), (128, 128, 3, 3, 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW' , 'float32'). A fallback configuration is used, which may bring great performance regression.

Even though these are merely warning and the script executes successfully, the new compiled models are having slower inference time ,and hence it seems that autotuning isnt performing well at all.

Please let me know where I am going wrong, or how to resolve the issue.

Thanks

1 Like

I am currently facing the same problem with that example (my CPU: Intel® Xeon® CPU E5-2620 v4 @ 2.10GHz). What I find interesting is that optimizing only one convolutional layer in a separate script, using the same options and task (topi_x86_conv2d_NCHWc, opt_lvl = 3, and llvm -mcpu=core-avx2) does not produce this error for me. I am wondering if this is related to the conversion done in the tutorial in the function “tune_kernels”:

def tune_kernels(tasks,
                 measure_option,
                 tuner='gridsearch',
                 early_stopping=None,
                 log_filename='tuning.log'):

    for i, tsk in enumerate(tasks):
        prefix = "[Task %2d/%2d] " % (i+1, len(tasks))

        # converting conv2d tasks to conv2d_NCHWc tasks
        op_name = tsk.workload[0]
        if op_name == 'conv2d':
            func_create = 'topi_x86_conv2d_NCHWc'
        elif op_name == 'depthwise_conv2d_nchw':
            func_create = 'topi_x86_depthwise_conv2d_NCHWc_from_nchw'
        else:
            raise ValueError("Tuning {} is not supported on x86".format(op_name))

        task = autotvm.task.create(func_create, args=tsk.args,
                                   target=target, template_key='direct')
        task.workload = tsk.workload

There has been reports of conversion issue in the past:

I am tagging @eqy since he may be familiar with this type of issue

2 Likes

Tagging additional members who were involved in similar discussions on x86 autotuning. @kevinthesun @comaniac @apivovarov

I am currently facing the same problem with that example.

I run the script tune_relay_x86.py with all the default configurations except target = "llvm -mcpu=core-avx2" . And modify the get_network function to use inceptionv3, show the same warning.It‘s can working,but Auto-tuning does not appear to be in effect…

Cannot find config for target=llvm -device=tracing, workload=('conv2d',xxx
Cannot find config for target=llvm -device=tracing, workload=('conv2d',xxx

Are you comparing with default schedule?

I compared it to the standard inference time on mxnet (cpu) vs the tvm compiled model. In both the cases inference was done in python.

You can compare tuned schedule with default schedule to see whether it is because tuned schedule is slow.