[ONNX] Failed to execute certain ONNX model on CUDA `CUDA_ERROR_INVALID_PTX`

Lyken17 · July 12, 2019, 11:40pm

Hi there,

When I was trying to execute ProxylessNAS on CUDA devices, following the pipeline PyTorch -> ONNX -> TVM, I met following errors with cuda error code CUDA_ERROR_INVALID_PTX

Testing proxyless_net1
Exception occurs: 
Traceback (most recent call last):
  [bt] (3) /home/yaoyao/miniconda3/envs/python37/lib/python3.7/site-packages/tvm-0.6.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(TVMFuncCall+0x65) [0x7f5f53b04ff5]
  [bt] (2) /home/yaoyao/miniconda3/envs/python37/lib/python3.7/site-packages/tvm-0.6.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::detail::PackFuncVoidAddr_<4, tvm::runtime::CUDAWrappedFunc>(tvm::runtime::CUDAWrappedFunc, std::vector<tvm::runtime::detail::ArgConvertCode, std::allocator<tvm::runtime::detail::ArgConvertCode> > const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0xb6) [0x7f5f53b81fb6]
  [bt] (1) /home/yaoyao/miniconda3/envs/python37/lib/python3.7/site-packages/tvm-0.6.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(tvm::runtime::CUDAWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*, void**) const+0x832) [0x7f5f53b81e32]
  [bt] (0) /home/yaoyao/miniconda3/envs/python37/lib/python3.7/site-packages/tvm-0.6.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x43) [0x7f5f533c2a33]
  File "/home/yaoyao/repos/tvm/src/runtime/cuda/cuda_module.cc", line 111
  File "/home/yaoyao/repos/tvm/src/runtime/module_util.cc", line 73
CUDAError: Check failed: ret == 0 (-1 vs. 0) : cuModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: CUDA_ERROR_INVALID_PTX
Testing proxyless_net2
Successfully.

After looking around, CUDA_ERROR_INVALID_PTX hints the problem might be related with PTX JIT compiler. However, the tvm runtime works well when I load the ResNet from torchvision. Could you help on this?

The code to re-produce is attached on Github and the environment we are using is

GPU: Nvidia GTX 1080
TVM: latest commit 45878ff2ab111b55448cf62e34bc58eb3e36b671
CUDA: 10.1
CUDNN: 7.0

Lyken17 · July 13, 2019, 12:05am

Update, we have figured out that the issue might be caused by 7x7 conv kernel. After replacing 7x7 kernels to 3x3, the module can run without error.

vinx13 · July 13, 2019, 4:38am

Have you tuned your model with AutoTVM?
The default config might be invalid (too many threads, too much shared memory …)

Lyken17 · July 14, 2019, 11:25pm

The opt_level is set to 1. I have tried to set it to 3, but the same error still occurs…

Lyken17 · July 17, 2019, 12:07am

I’ve tried to tune with AutoTVM, however it is still the same error. Is there anything special with 7x7 conv2d?

vinx13 · July 17, 2019, 3:25am

Did you always meet the error during tuning of 7x7 conv?

Lyken17 · July 18, 2019, 1:39am

This happens randomly. In some cases, the problem can run without any error. I haven’t figured out what exactly triggers it.

Lyken17 · July 23, 2019, 1:20am

continued at GPU schedule fails on 7x7 depth-wise conv when num_channels is multiple of 32

Current workthrough is to auto tune 7x7 kernels instead of using the default schedule.