Hi, everyone.
Here is what happened.
I have two devices, A with x86 cpu and 1080ti, and B(xavier) with arm cpu and nvidia gpu.
Each device has TVM fully installed, and I can autotune my model on each device individually.
Now I want autotune my model on B using A with RPC, cause it is slow when tuning on device B.
Error occurs:
DEBUG:autotvm:No: 503 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(InstantiationError('Traceback (most recent call last):
[bt] (1) /home/pzq/tvm_cuda10/build/libtvm.so(TVMFuncCall+0x61) [0x7fc72c9d8521]
[bt] (0) /home/
pzq/tvm_cuda10/build/libtvm.so(+0x122b75b) [0x7fc72c9d375b]
File "/home/pzq/tvm_cuda10/python/tvm/_ffi/_ctypes/function.py", line 72, in cfun
rv = local_pyfunc(*pyargs)
File "/home/pzq/tvm_cuda10/python/tvm/autotvm/measure/
measure_methods.py", line 607, in verify_pass
raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: **Skipped because of invalid gpu kernel**',),), **error_no=1**, all_cost=0.293758
8691711426, timestamp=1565772783.6429822) [('tile_b', [36, 1, 1, 1]), ('tile_y', [1, 1, 4, 1]), ('tile_x', [1, 1, 1600, 6]), ('tile_rc', [1, 4]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],winograd,None,406375
DEBUG:autotvm:No: 504 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(RuntimeError('Traceback (most recent call last):
[bt] (3) /mnt/nvme/pzq/Desktop/tvm/build/libtvm.so(TVMFuncCall+0x70) [0x7f8437a7a0]
[bt] (2) /mnt/nvme
/pzq/Desktop/tvm/build/libtvm.so(std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::detail::PackFuncVoidAddr_<4, tvm::runtime::CUDAWrappedFunc>(tvm::runtime::CUDAWrappedFunc, std::vector<tvm
::runtime::detail::ArgConvertCode, std::allocator<tvm::runtime::detail::ArgConvertCode> > const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime:
:TVMRetValue*&&)+0xe8) [0x7f843ed3e0]
[bt] (1) /mnt/nvme/pzq/Desktop/tvm/build/libtvm.so(tvm::runtime::CUDAWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*, void**) const+0x6cc) [0x7f843ed214]
[bt] (0) /
mnt/nvme/pzq/Desktop/tvm/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x4c) [0x7f83c9108c]
File "/home/pzq/Desktop/tvm/src/runtime/cuda/cuda_module.cc", line 111
TVMErr',),), **error_no=4**, all_cost=4.153692960739136, ti
mestamp=1565772786.2338314) [('tile_b', [36, 1, 1, 1]), ('tile_y', [1, 2, 2, 1]), ('tile_x', [200, 16, 1, 3]), ('tile_rc', [1, 4]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)],winograd,None,107584
I have noticed that GFLOPS is always 0.0/0.0 and error_no = 1 or 4.
What could the problem be like?
What should I do?
I have cuda10.1 on A and cuda10.0 on B.
Should the cuda version on tuning machine A be strictly the same as target device B?