[Auto-tune] Error occurs during inference when using auto-tuned schedule


#1

Dear TVM community,

I found that the auto-tuned schedule for the conv2d [in_channels=80, out_channels=192, kernel_size=(3,3), strides=(1,1), padding=(0,0)] would make the inference failed. And the default schedule works well.

The relay program is:

def get_network(name, batch_size):
    """Get the symbol definition and random weight of a network"""
    weight_name = 'weight'
    input_shape = (1, 80, 73, 73)
    output_shape = (1, 192, 71, 71)
    params = {}
    weight = tvm.nd.empty(shape=(192, 80, 3, 3))
    params[weight_name] = weight
    v = relay.var('data', shape=input_shape)  # 1 x 80 x 73 x 73
    v = relay.nn.conv2d(v,
            weight=relay.var(weight_name,shape=weight.shape), 
            strides=(1, 1), padding=(0, 0), 
            channels=192, kernel_size=(3, 3))  # 1 x 192 x 71 x 71
    fn = relay.Function(relay.analysis.free_vars(v), v)
    mod = relay.Module.from_expr(fn)
    return mod, params, input_shape, output_shape

Here is the code to reproduce the error and the error message is:

Extract tasks...
Tuning...
[Task  1/ 1]  Current/Best: 2233.51/3996.47 GFLOPS | Progress: (100/100) | 634.76 s Done.
Compile...
Evaluate inference time cost...
Traceback (most recent call last):

  File "/home/yaoyao/PycharmProjects/play_ground/demo.py", line 154, in <module>
    tune_and_evaluate(tuning_option)

  File "/home/yaoyao/PycharmProjects/play_ground/demo.py", line 148, in tune_and_evaluate
    prof_res = np.array(ftimer().results) * 1000  # convert to millisecond

  File "/home/yaoyao/anaconda3/lib/python3.7/site-packages/tvm-0.6.dev0-py3.7-linux-x86_64.egg/tvm/module.py", line 198, in evaluator
    blob = feval(*args)

  File "tvm/_ffi/_cython/./function.pxi", line 310, in tvm._ffi._cy3.core.FunctionBase.__call__

  File "tvm/_ffi/_cython/./function.pxi", line 245, in tvm._ffi._cy3.core.FuncCall

  File "tvm/_ffi/_cython/./function.pxi", line 234, in tvm._ffi._cy3.core.FuncCall3

  File "tvm/_ffi/_cython/./base.pxi", line 171, in tvm._ffi._cy3.core.CALL

tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (5) /home/yaoyao/anaconda3/lib/python3.7/site-packages/tvm-0.6.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(TVMFuncCall+0x65) [0x7f2de73be845]
  [bt] (4) /home/yaoyao/anaconda3/lib/python3.7/site-packages/tvm-0.6.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(+0xc26c07) [0x7f2de7416c07]
  [bt] (3) /home/yaoyao/anaconda3/lib/python3.7/site-packages/tvm-0.6.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(+0xc268ba) [0x7f2de74168ba]
  [bt] (2) /home/yaoyao/anaconda3/lib/python3.7/site-packages/tvm-0.6.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(tvm::runtime::GraphRuntime::Run()+0x37) [0x7f2de7426557]
  [bt] (1) /home/yaoyao/anaconda3/lib/python3.7/site-packages/tvm-0.6.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(+0xc38727) [0x7f2de7428727]
  [bt] (0) /home/yaoyao/anaconda3/lib/python3.7/site-packages/tvm-0.6.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(+0xbe5125) [0x7f2de73d5125]
  File "/home/yaoyao/repos/tvm/src/runtime/module_util.cc", line 73
TVMError: Check failed: ret == 0 (-1 vs. 0) : Assert fail: (73 == int32(arg2.shape[2])), Argument arg2.shape[2] has an unsatisfied constraint

It would be very appreciated if you can help!

UPDATE:
I tried the same code on NVIDIA TITAN XP and it does not have the problem and works well.
But the error will occur when I run the code on GTX 1070, GTX 1080 and RTX 2080Ti.


#2

When the model is compiled under tuned schedule with opt_level = 3, the input shapes are
(1, 80, 73, 73) and (4, 4, 80, 192) and in this case, there will be the above error.

When the model is compiled under tuned schedule with opt_level = 2, the input shapes are
(1, 80, 73, 73) and (192, 80, 3, 3) and in this case, the module works well.

So the problem might come from some optimization related to data layout?