I got following error while I tune GPU on tx2 remotely
DEBUG:autotvm:No: 158 GFLOPS: 0.00/0.00 result:
MeasureResult(costs=(RuntimeError('Except caught from RPC call:
[13:18:50] /home/nvidia/src/tvm/src/ru
ntime/module_util.cc:53: Check failed: ret == 0 (-1 vs. 0) [13:18:50]
/home/nvidia/src/tvm/src/runtime/cuda/cuda_module.cc:91:
CUDAError: cuModuleLoadData(&(m
odule_[device_id]), data_.c_str()) failed with error:
CUDA_ERROR_INVALID_PTX\n\n'),), error_no=4,
all_cost=4.549376487731934, timestamp=1544188730.826613) [(
'tile_f', [1, 2, 2, 4]), ('tile_y', [1, 1, 1, 1]), ('tile_x', [1, 1, 1, 1]), (
'tile_rc', [64, 2]), ('tile_ry', [3, 1]), ('tile_rx', [3, 1]), ('auto_unroll_max
_step', 512), ('unroll_explicit', 0)],direct,None,1184
DEBUG:autotvm:No: 159 GFLOPS: 0.00/0.00 result:
MeasureResult(costs=(RuntimeError('Except caught from RPC call:
[13:18:51] /home/nvidia/src/tvm/src/ru
ntime/module_util.cc:53: Check failed: ret == 0 (-1 vs. 0) [13:18:51]
/home/nvidia/src/tvm/src/runtime/cuda/cuda_module.cc:91:
CUDAError: cuModuleLoadData(&(m
odule_[device_id]), data_.c_str()) failed with error:
CUDA_ERROR_INVALID_PTX\n\n'),), error_no=4,
all_cost=5.119001150131226, timestamp=1544188731.4662838)
[(
'tile_f', [2, 1, 8, 1]), ('tile_y', [1, 1, 1, 1]), ('tile_x', [1, 1, 1, 1]),
('tile_rc', [16, 8]), ('tile_ry', [1, 3]), ('tile_rx', [1, 3]),
('auto_unroll_max
_step', 512), ('unroll_explicit', 1)],direct,None,5437
DEBUG:autotvm:No: 160 GFLOPS: 0.00/0.00 result:
MeasureResult(costs=(RuntimeError('Except caught from RPC call:
[13:18:52] /home/nvidia/src/tvm/src/ru
ntime/module_util.cc:53: Check failed: ret == 0 (-1 vs. 0) [13:18:52]
/home/nvidia/src/tvm/src/runtime/cuda/cuda_module.cc:91:
CUDAError: cuModuleLoadData(&(m
odule_[device_id]), data_.c_str()) failed with error:
CUDA_ERROR_INVALID_PTX\n\n'),), error_no=4,
all_cost=5.690924644470215, timestamp=1544188732.0810251)
[(
'tile_f', [8, 2, 1, 1]), ('tile_y', [1, 1, 1, 1]), ('tile_x', [1, 1, 1, 1]),
('tile_rc', [32, 4]), ('tile_ry', [1, 3]), ('tile_rx', [1, 3]),
('auto_unroll_max
_step', 0), ('unroll_explicit', 1)],direct,None,4271
WARNING:autotvm:Too many errors happen in the tuning. Now is in
debug mode
I set target and target_host as following code
target = 'cuda'
target_host = 'llvm -target=aarch64-linux-gnu'
What I have tried :
-
Tune cpu remotely : worked
-
Tune gpu locally on tx2 : though it runs without error in the begining, but it still can’t be tuned because of not having enough of memory in the end
Error 2
-
Compile Module on x86 pc with CPU and copy tar file to tx2
target = ‘llvm -target=aarch64-linux-gnu’
target_host = ‘llvm -target=aarch64-linux-gnu’Worked
-
Compile Module on x86 pc with GPU and copy tar file to tx2
target = ‘cuda’
target_host = ‘llvm -target=aarch64-linux-gnu’but got following error
Traceback (most recent call last):
File "script/run_yolov3_tx2.py", line 82, in <module>
m.run()
File "/home/nvidia/src/tvm/python/tvm/contrib/graph_runtime.py", line 155, in run
self._run()
File "/home/nvidia/src/tvm/python/tvm/_ffi/_ctypes/function.py", line 185, in __call__
ctypes.byref(ret_val), ctypes.byref(ret_tcode)))
File "/home/nvidia/src/tvm/python/tvm/_ffi/base.py", line 72, in check_call
raise TVMError(py_str(_LIB.TVMGetLastError()))
tvm._ffi.base.TVMError: [14:19:51] /home/nvidia/src/tvm/src/runtime/module_util.cc:53: Check failed: ret == 0 (-1 vs. 0) [14:19:51] /home/nvidia/src/tvm/src/runtime/cuda/cuda_module.cc:91: CUDAError: cuModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: CUDA_ERROR_INVALID_PTX
Stack trace returned 10 entries:
[bt] (0) /home/nvidia/src/tvm/build/libtvm.so(dmlc::StackTrace[abi:cxx11]()+0x118) [0x7f98123cc0]
[bt] (1) /home/nvidia/src/tvm/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x44) [0x7f981248ec]
[bt] (2) /home/nvidia/src/tvm/build/libtvm.so(tvm::runtime::CUDAWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*, void**) const+0x6d8) [0x7f98768e80]
[bt] (3) /home/nvidia/src/tvm/build/libtvm.so(std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::detail::PackFuncVoidAddr_<8, tvm::runtime::CUDAWrappedFunc>(tvm::runtime::CUDAWrappedFunc, std::vector<tvm::runtime::detail::ArgConvertCode, std::allocator<tvm::runtime::detail::ArgConvertCode> > const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0xd8) [0x7f98769168]
[bt] (4) /home/nvidia/src/tvm/build/libtvm.so(TVMFuncCall+0x70) [0x7f98717a70]
[bt] (5) so/yolov3.tx2.gpu.tvm.tar.so(+0x1ce4) [0x7f7dbe1ce4]
[bt] (6) so/yolov3.tx2.gpu.tvm.tar.so(fuse_conv2d_broadcast_mul_broadcast_add_leaky_relu+0x4e0) [0x7f7dbe17d0]
[bt] (7) /home/nvidia/src/tvm/build/libtvm.so(+0x8a02b4) [0x7f9871e2b4]
[bt] (8) /home/nvidia/src/tvm/build/libtvm.so(+0x8d4ff4) [0x7f98752ff4]
[bt] (9) /home/nvidia/src/tvm/build/libtvm.so(tvm::runtime::GraphRuntime::Run()+0x40) [0x7f98751408]