Description
When deploying models to GPU platform, I notice a interesting failure case that when convolution has attributes
- kernel size: 7
- stride: 2
- padding: 3
- depth-wise conv (groups = input_channels)
- And, the number of channels is multiple of 32
The given schedule raises the following errors
tvm._ffi.base.TVMError: Traceback (most recent call last):
[bt] (3) /home/ligeng/Workspace/tvm/build/libtvm.so(TVMFuncCall+0x65) [0x7f0c03ed5725]
[bt] (2) /home/ligeng/Workspace/tvm/build/libtvm.so(std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::PackedFunc tvm::runtime::detail::PackFuncVoidAddr_<4, tvm::runtime::CUDAWrappedFunc>(tvm::runtime::CUDAWrappedFunc, std::vector<tvm::runtime::detail::ArgConvertCode, std::allocator<tvm::runtime::detail::ArgConvertCode> > const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0xb6) [0x7f0c03f52f56]
[bt] (1) /home/ligeng/Workspace/tvm/build/libtvm.so(tvm::runtime::CUDAWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*, void**) const+0x832) [0x7f0c03f52dd2]
[bt] (0) /home/ligeng/Workspace/tvm/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x43) [0x7f0c037d4563]
File "/home/ligeng/Workspace/tvm/src/runtime/cuda/cuda_module.cc", line 111
File "/home/ligeng/Workspace/tvm/src/runtime/module_util.cc", line 73
CUDAError: Check failed: ret == 0 (-1 vs. 0) : cuModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: CUDA_ERROR_INVALID_PTX
And even I try to tune it with auto-tvm. It does not fix the error. Any ideas?
Minimal code to reproduce
import mxnet
import mxnet.gluon
import tvm
import tvm.relay as relay
import tvm.contrib.graph_runtime
channels = 32
layer = mxnet.gluon.nn.Conv2D(channels=channels, kernel_size=(7, 7), strides=(2, 2), padding=(3, 3), groups=channels)
layer.initialize()
x = mxnet.nd.random.uniform(-1, 1, (1, channels, 14, 14))
layer(x)
mod, params = relay.frontend.from_mxnet(layer, {'data': (1, channels, 14, 14)})
with relay.build_config(opt_level=3):
graph, lib, params = relay.build(mod, target='cuda', params=params)
ctx = tvm.gpu(0)
tvm_module = tvm.contrib.graph_runtime.create(graph, lib, ctx)
ftimer = tvm_module.module.time_evaluator('run', ctx)
ftimer()