GPU schedule fails on 7x7 depth-wise conv when num_channels is multiple of 32

Lyken17 · July 19, 2019, 9:27pm

Description

When deploying models to GPU platform, I notice a interesting failure case that when convolution has attributes

kernel size: 7
stride: 2
padding: 3
depth-wise conv (groups = input_channels)
And, the number of channels is multiple of 32

The given schedule raises the following errors

tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (3) /home/ligeng/Workspace/tvm/build/libtvm.so(TVMFuncCall+0x65) [0x7f0c03ed5725]
  [bt] (2) /home/ligeng/Workspace/tvm/build/libtvm.so(std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::PackedFunc tvm::runtime::detail::PackFuncVoidAddr_<4, tvm::runtime::CUDAWrappedFunc>(tvm::runtime::CUDAWrappedFunc, std::vector<tvm::runtime::detail::ArgConvertCode, std::allocator<tvm::runtime::detail::ArgConvertCode> > const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0xb6) [0x7f0c03f52f56]
  [bt] (1) /home/ligeng/Workspace/tvm/build/libtvm.so(tvm::runtime::CUDAWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*, void**) const+0x832) [0x7f0c03f52dd2]
  [bt] (0) /home/ligeng/Workspace/tvm/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x43) [0x7f0c037d4563]
  File "/home/ligeng/Workspace/tvm/src/runtime/cuda/cuda_module.cc", line 111
  File "/home/ligeng/Workspace/tvm/src/runtime/module_util.cc", line 73
CUDAError: Check failed: ret == 0 (-1 vs. 0) : cuModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: CUDA_ERROR_INVALID_PTX

And even I try to tune it with auto-tvm. It does not fix the error. Any ideas?

Minimal code to reproduce

import mxnet
import mxnet.gluon

import tvm
import tvm.relay as relay
import tvm.contrib.graph_runtime

channels = 32
layer = mxnet.gluon.nn.Conv2D(channels=channels, kernel_size=(7, 7), strides=(2, 2), padding=(3, 3), groups=channels)
layer.initialize()
x = mxnet.nd.random.uniform(-1, 1, (1, channels, 14, 14))
layer(x)
mod, params = relay.frontend.from_mxnet(layer, {'data': (1, channels, 14, 14)})
with relay.build_config(opt_level=3):
    graph, lib, params = relay.build(mod, target='cuda', params=params)
ctx = tvm.gpu(0)
tvm_module = tvm.contrib.graph_runtime.create(graph, lib, ctx)
ftimer = tvm_module.module.time_evaluator('run', ctx)
ftimer()

Lyken17 · July 23, 2019, 1:21am

Seems the default schedule for GPU has problem with 7x7 kernel. My current solution is to auto tune it and use the searched schedule as a replacement.

vinx13 · July 23, 2019, 1:53pm

what do you mean? auto-tuning can fix your problem right?

Lyken17 · July 23, 2019, 2:34pm

Autotvm did not provide a correct schedule during my first trial, not lucky enough : |

When I try it one more time, it fixes the problem.