[ONNX] Failed to execute certain ONNX model on CUDA `CUDA_ERROR_INVALID_PTX`


#1

Hi there,

When I was trying to execute ProxylessNAS on CUDA devices, following the pipeline PyTorch -> ONNX -> TVM, I met following errors with cuda error code CUDA_ERROR_INVALID_PTX

Testing proxyless_net1
Exception occurs: 
Traceback (most recent call last):
  [bt] (3) /home/yaoyao/miniconda3/envs/python37/lib/python3.7/site-packages/tvm-0.6.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(TVMFuncCall+0x65) [0x7f5f53b04ff5]
  [bt] (2) /home/yaoyao/miniconda3/envs/python37/lib/python3.7/site-packages/tvm-0.6.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::detail::PackFuncVoidAddr_<4, tvm::runtime::CUDAWrappedFunc>(tvm::runtime::CUDAWrappedFunc, std::vector<tvm::runtime::detail::ArgConvertCode, std::allocator<tvm::runtime::detail::ArgConvertCode> > const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0xb6) [0x7f5f53b81fb6]
  [bt] (1) /home/yaoyao/miniconda3/envs/python37/lib/python3.7/site-packages/tvm-0.6.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(tvm::runtime::CUDAWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*, void**) const+0x832) [0x7f5f53b81e32]
  [bt] (0) /home/yaoyao/miniconda3/envs/python37/lib/python3.7/site-packages/tvm-0.6.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x43) [0x7f5f533c2a33]
  File "/home/yaoyao/repos/tvm/src/runtime/cuda/cuda_module.cc", line 111
  File "/home/yaoyao/repos/tvm/src/runtime/module_util.cc", line 73
CUDAError: Check failed: ret == 0 (-1 vs. 0) : cuModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: CUDA_ERROR_INVALID_PTX
Testing proxyless_net2
Successfully.

After looking around, CUDA_ERROR_INVALID_PTX hints the problem might be related with PTX JIT compiler. However, the tvm runtime works well when I load the ResNet from torchvision. Could you help on this?

The code to re-produce is attached on Github and the environment we are using is

  • GPU: Nvidia GTX 1080
  • TVM: latest commit 45878ff2ab111b55448cf62e34bc58eb3e36b671
  • CUDA: 10.1
  • CUDNN: 7.0

#2

Update, we have figured out that the issue might be caused by 7x7 conv kernel. After replacing 7x7 kernels to 3x3, the module can run without error.


#3

Have you tuned your model with AutoTVM?
The default config might be invalid (too many threads, too much shared memory …)


#4

The opt_level is set to 1. I have tried to set it to 3, but the same error still occurs…


#5

I’ve tried to tune with AutoTVM, however it is still the same error. Is there anything special with 7x7 conv2d?


#6

Did you always meet the error during tuning of 7x7 conv?


#7

This happens randomly. In some cases, the problem can run without any error. I haven’t figured out what exactly triggers it.


#8

continued at GPU schedule fails on 7x7 depth-wise conv when num_channels is multiple of 32

Current workthrough is to auto tune 7x7 kernels instead of using the default schedule.