[SOLVED] [AutoTVM] RuntimeError (Return code=4) during autotuning

Hi,
I’m running the TVM autotuner and getting the following error.

Get devices for measurement successfully!
No: 1   GFLOPS: 0.00/0.00       result: MeasureResult(costs=(RuntimeError('[12:37:17] /homes/tharindu/tvm/src/runtime/rpc/rpc_session.cc:942: Check failed: code == RPCCode::kReturn code=4\n\n',),), error_no=4, all_cost=10.457988500595093, timestamp=1540323437.7072837)    [('tile_h', 14), ('tile_w', 7)],,None,20
No: 2   GFLOPS: 0.00/0.00       result: MeasureResult(costs=(RuntimeError('[12:37:28] /homes/tharindu/tvm/src/runtime/rpc/rpc_session.cc:942: Check failed: code == RPCCode::kReturn code=4\n\n',),), error_no=4, all_cost=10.44745397567749, timestamp=1540323448.0087845)     [('tile_h', 4), ('tile_w', 4)],,None,9
No: 3   GFLOPS: 0.00/0.00       result: MeasureResult(costs=(RuntimeError('[12:37:38] /homes/tharindu/tvm/src/runtime/rpc/rpc_session.cc:942: Check failed: code == RPCCode::kReturn code=4\n\n',),), error_no=4, all_cost=10.438765048980713, timestamp=1540323458.3325357)    [('tile_h', 14), ('tile_w', 2)],,None,4
No: 4   GFLOPS: 0.00/0.00       result: MeasureResult(costs=(RuntimeError('[12:37:48] /homes/tharindu/tvm/src/runtime/rpc/rpc_session.cc:942: Check failed: code == RPCCode::kReturn code=4\n\n',),), error_no=4, all_cost=10.431469678878784, timestamp=1540323468.6489294)    [('tile_h', 4), ('tile_w', 28)],,None,49
No: 5   GFLOPS: 0.00/0.00       result: MeasureResult(costs=(RuntimeError('[12:37:58] /homes/tharindu/tvm/src/runtime/rpc/rpc_session.cc:942: Check failed: code == RPCCode::kReturn code=4\n\n',),), error_no=4, all_cost=10.44149136543274, timestamp=1540323478.9663706)     [('tile_h', 28), ('tile_w', 8)],,None,30
No: 6   GFLOPS: 0.00/0.00       result: MeasureResult(costs=(RuntimeError('[12:38:09] /homes/tharindu/tvm/src/runtime/rpc/rpc_session.cc:942: Check failed: code == RPCCode::kReturn code=4\n\n',),), error_no=4, all_cost=10.436156034469604, timestamp=1540323489.2876792)    [('tile_h', 56), ('tile_w', 2)],,None,7
No: 7   GFLOPS: 0.00/0.00       result: MeasureResult(costs=(RuntimeError('[12:38:19] /homes/tharindu/tvm/src/runtime/rpc/rpc_session.cc:942: Check failed: code == RPCCode::kReturn code=4\n\n',),), error_no=4, all_cost=10.43612790107727, timestamp=1540323499.6087935)     [('tile_h', 28), ('tile_w', 16)],,None,46
No: 8   GFLOPS: 0.00/0.00       result: MeasureResult(costs=(RuntimeError('[12:38:29] /homes/tharindu/tvm/src/runtime/rpc/rpc_session.cc:942: Check failed: code == RPCCode::kReturn code=4\n\n',),), error_no=4, all_cost=10.433260440826416, timestamp=1540323509.9270568)    [('tile_h', 56), ('tile_w', 56)],,None,63
No: 9   GFLOPS: 0.00/0.00       result: MeasureResult(costs=(RuntimeError('[12:38:40] /homes/tharindu/tvm/src/runtime/rpc/rpc_session.cc:942: Check failed: code == RPCCode::kReturn code=4\n\n',),), error_no=4, all_cost=10.43850064277649, timestamp=1540323520.2430663)     [('tile_h', 28), ('tile_w', 4)],,None,14
No: 10  GFLOPS: 0.00/0.00       result: MeasureResult(costs=(RuntimeError('[12:38:50] /homes/tharindu/tvm/src/runtime/rpc/rpc_session.cc:942: Check failed: code == RPCCode::kReturn code=4\n\n',),), error_no=4, all_cost=10.452775478363037, timestamp=1540323530.5664165)    [('tile_h', 7), ('tile_w', 4)],,None,10

Is there a way to get a more detailed/verbose description of the error so I can debug? Or does anybody know what Return code = 4 represents in the autotvm context?

Thanks,

1 Like

The error is https://github.com/dmlc/tvm/blob/3a1bb8c7d4ce8ef5a0b15964a08bc583fab17459/src/runtime/rpc/rpc_session.h#L31, kShutdown. If I remember correctly I’ve hit this in the past when my kernel had bugs that could cause crashes/segfaults (via out of bounds accesses, etc). One trick is to fix a failing instantiation (i.e. here tile_h = 14, tile_w = 7), and then just run it locally (or remote via tvm RPC) and see what errors you get.

I’ve also hit this when I set the number argument in the runner too high

Thanks for valuable suggestions. I figured out the problem in my case. The code generated by my schedule with default optimizations takes too long to run and the process gets killed by autotvm.LocalRunner with its default timeout. I increased the timeout and it works as expected.

measure_option = autotvm.measure_option(builder=autotvm.LocalBuilder(), runner=autotvm.LocalRunner(number=5, timeout=100))

Hope this helps someone.

5 Likes

Note, I have also encountered this issue a couple of times. It had come from the versions of tvm on the host and the device being different. It was fixed by ensuring that they are the same commit.

1 Like

Hi currently I encounted the same error_no 4 but will a lot of RPC backtraces, here is the log.

No: 1 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(RuntimeError('Traceback (most recent call last):\n [bt] (5) /home/huangt/MyWork/for_tvm/tvm/build/libtvm.so(TVMFuncCall+0x61) [0x7fdb6d17ef41]\n [bt] (4) /home/huangt/MyWork/for_tvm/tvm/build/libtvm.so(+0x1b71972) [0x7fdb6d1a7972]\n [bt] (3) /home/huangt/MyWork/for_tvm/tvm/build/libtvm.so(tvm::runtime::RPCWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const+0x274) [0x7fdb6d1aa1f4]\n [bt] (2) /home/huangt/MyWork/for_tvm/tvm/build/libtvm.so(tvm::runtime::RPCClientSession::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)> const&)+0x57) [0x7fdb6d1bd087]\n [bt] (1) /home/huangt/MyWork/for_tvm/tvm/build/libtvm.so(tvm::runtime::RPCEndpoint::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)>)+0x39c) [0x7fdb6d1b435c]\n [bt] (0) /home/huangt/MyWork/for_tvm/tvm/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x61) [0x7fdb6c2e9d61]\n File "/home/huangt/MyWork/for_tvm/tvm/src/runtime/rpc/rpc_endpoint.cc", line 81'),), error_no=4, all_cost=0.8075673580169678, timestamp=1617869857.029351) [('reorder_sparse_conv', (1, 0))],None,1

Could you please share some debug tips for this issue ? Many thanks.