When I run the auto-tuning example, I always see RuntimeErrors. For example, this is the output of the “Tuning High Performance Convolution on NVIDIA GPUs” script:
Traceback (most recent call last): File "./run/tune_conv2d_cuda.py", line 152, in <module> func(a_tvm, w_tvm, c_tvm) File "/usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/_ffi/function.py", line 128, in __call__ return f(*args) File "/usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/_ffi/_ctypes/function.py", line 185, in __call__ ctypes.byref(ret_val), ctypes.byref(ret_tcode))) File "/usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/_ffi/base.py", line 72, in check_call raise TVMError(py_str(_LIB.TVMGetLastError())) tvm._ffi.base.TVMError: [08:36:36] /usr/local/tvm/src/runtime/module_util.cc:53: Check failed: ret == 0 (-1 vs. 0) [08:36:36] /usr/local/tvm/src/runtime/cuda/cuda_module.cc:91: CUDAError: cuModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: CUDA_ERROR_UNKNOWN
However, using the same container, my team member does not see the same RuntimeErrors. Additionally, I can run simple pytorch scripts on GPU without a problem.
I found some related issues on the forum, but they didn’t give any hints to how they resolved the problem:
Any ideas on what could cause this, even if the error comes from the CUDA side? CUDA_ERROR_UNKNOWN is not very descriptive, but perhaps you’ve seen similar issues before.