Hi, I am running autotvm example for mobile gpu on android, it runs for a while but at some point I get a runtime error (I’ve turned on the debug log output, set n_trial = 20
also):
Extract tasks...
Tuning...
INFO:autotvm:Get devices for measurement successfully!
DEBUG:autotvm:No: 1 GFLOPS: 6.20/6.20 result: MeasureResult(costs=(0.0372740654,), error_no=0, all_cost=3.4236087799072266, timestamp=1558606325.1270492) [('tile_bna', 16), ('tile_bnb', 2), ('tile_t1', [16, 4]), ('tile_t2', [16, 4]), ('c_unroll', [16, 4]), ('yt', 8)],winograd,None,17559
DEBUG:autotvm:No: 2 GFLOPS: 0.00/6.20 result: MeasureResult(costs=(RuntimeError('Traceback (most recent call last):\n [bt] (3) /home/SERILOCAL/n.perto/Documents/tvm/build/libtvm.so(TVMFuncCall+0x61) [0x7fe0569097d1]\n [bt] (2) /home/SERILOCAL/n.perto/Documents/tvm/build/libtvm.so(+0x9e486b) [0x7fe05694c86b]\n [bt] (1) /home/SERILOCAL/n.perto/Documents/tvm/build/libtvm.so(+0x9d9db7) [0x7fe056941db7]\n [bt] (0) /home/SERILOCAL/n.perto/Documents/tvm/build/libtvm.so(+0x172ab2) [0x7fe0560daab2]\n File "/home/SERILOCAL/n.perto/Documents/tvm/src/runtime/rpc/rpc_session.cc", line 962\nTVMError: Check failed: code == RPCCode: :kReturn: code=4',),), error_no=4, all_cost=9.246337175369263, timestamp=1558606330.5984135) [('tile_bna', 16), ('tile_bnb', 16), ('tile_t1', [4, 16]), ('tile_t2', [32, 2]), ('c_unroll', [16, 4]), ('yt', 2)],winograd,None,7649
DEBUG:autotvm:No: 3 GFLOPS: 0.00/6.20 result: MeasureResult(costs=(RuntimeError('Traceback (most recent call last):\n [bt] (3) /home/SERILOCAL/n.perto/Documents/tvm/build/libtvm.so(TVMFuncCall+0x61) [0x7fe0569097d1]\n [bt] (2) /home/SERILOCAL/n.perto/Documents/tvm/build/libtvm.so(+0x9e486b) [0x7fe05694c86b]\n [bt] (1) /home/SERILOCAL/n.perto/Documents/tvm/build/libtvm.so(+0x9d9db7) [0x7fe056941db7]\n [bt] (0) /home/SERILOCAL/n.perto/Documents/tvm/build/libtvm.so(+0x172ab2) [0x7fe0560daab2]\n File "/home/SERILOCAL/n.perto/Documents/tvm/src/runtime/rpc/rpc_session.cc", line 962\nTVMError: Check failed: code == RPCCode: :kReturn: code=4',),), error_no=4, all_cost=19.06636929512024, timestamp=1558606340.6348014) [('tile_bna', 2), ('tile_bnb', 16), ('tile_t1', [4, 16]), ('tile_t2', [1, 64]), ('c_unroll', [64, 1]), ('yt', 8)],winograd,None,15871
DEBUG:autotvm:No: 4 GFLOPS: 0.00/6.20 result: MeasureResult(costs=('',), error_no=7, all_cost=5, timestamp=1558606442.982189) [('tile_bna', 2), ('tile_bnb', 8), ('tile_t1', [1, 64]), ('tile_t2', [16, 4]), ('c_unroll', [8, 8]), ('yt', 16)],winograd,None,23791
DEBUG:autotvm:No: 5 GFLOPS: 0.00/6.20 result: MeasureResult(costs=('',), error_no=7, all_cost=5, timestamp=1558606443.0251107) [('tile_bna', 2), ('tile_bnb', 2), ('tile_t1', [8, 8]), ('tile_t2', [1, 64]), ('c_unroll', [32, 2]), ('yt', 2)],winograd,None,7256
DEBUG:autotvm:No: 6 GFLOPS: 0.00/6.20 result: MeasureResult(costs=('',), error_no=7, all_cost=5, timestamp=1558606443.0252697) [('tile_bna', 8), ('tile_bnb', 1), ('tile_t1', [64, 1]), ('tile_t2', [8, 8]), ('c_unroll', [16, 4]), ('yt', 2)],winograd,None,7878
DEBUG:autotvm:No: 7 GFLOPS: 0.00/6.20 result: MeasureResult(costs=('',), error_no=7, all_cost=5, timestamp=1558606443.0372736) [('tile_bna', 2), ('tile_bnb', 8), ('tile_t1', [1, 64]), ('tile_t2', [32, 2]), ('c_unroll', [16, 4]), ('yt', 16)],winograd,None,22391
DEBUG:autotvm:No: 8 GFLOPS: 0.00/6.20 result: MeasureResult(costs=('',), error_no=7, all_cost=5, timestamp=1558606443.0462408) [('tile_bna', 16), ('tile_bnb', 1), ('tile_t1', [4, 16]), ('tile_t2', [32, 2]), ('c_unroll', [32, 2]), ('yt', 16)],winograd,None,21104
DEBUG:autotvm:No: 9 GFLOPS: 0.00/6.20 result: MeasureResult(costs=('',), error_no=7, all_cost=5, timestamp=1558606443.0463283) [('tile_bna', 16), ('tile_bnb', 16), ('tile_t1', [32, 2]), ('tile_t2', [8, 8]), ('c_unroll', [16, 4]), ('yt', 1)],winograd,None,3024
DEBUG:autotvm:No: 10 GFLOPS: 0.00/6.20 result: MeasureResult(costs=('',), error_no=7, all_cost=5, timestamp=1558606443.0531096) [('tile_bna', 8), ('tile_bnb', 4), ('tile_t1', [16, 4]), ('tile_t2', [16, 4]), ('c_unroll', [64, 1]), ('yt', 4)],winograd,None,10213
DEBUG:autotvm:No: 11 GFLOPS: 0.00/6.20 result: MeasureResult(costs=('',), error_no=7, all_cost=5, timestamp=1558606443.0574272) [('tile_bna', 8), ('tile_bnb', 8), ('tile_t1', [32, 2]), ('tile_t2', [32, 2]), ('c_unroll', [32, 2]), ('yt', 16)],winograd,None,21043
DEBUG:autotvm:No: 12 GFLOPS: 0.00/6.20 result: MeasureResult(costs=('',), error_no=7, all_cost=5, timestamp=1558606443.0574942) [('tile_bna', 8), ('tile_bnb', 4), ('tile_t1', [8, 8]), ('tile_t2', [64, 1]), ('c_unroll', [32, 2]), ('yt', 2)],winograd,None,6213
DEBUG:autotvm:No: 13 GFLOPS: 0.00/6.20 result: MeasureResult(costs=('',), error_no=7, all_cost=5, timestamp=1558606564.5806856) [('tile_bna', 16), ('tile_bnb', 2), ('tile_t1', [32, 2]), ('tile_t2', [8, 8]), ('c_unroll', [16, 4]), ('yt', 4)],winograd,None,12809
DEBUG:autotvm:No: 14 GFLOPS: 0.00/6.20 result: MeasureResult(costs=('',), error_no=7, all_cost=5, timestamp=1558606564.592348) [('tile_bna', 16), ('tile_bnb', 16), ('tile_t1', [2, 32]), ('tile_t2', [1, 64]), ('c_unroll', [8, 8]), ('yt', 1)],winograd,None,4874
DEBUG:autotvm:No: 15 GFLOPS: 0.00/6.20 result: MeasureResult(costs=('',), error_no=7, all_cost=5, timestamp=1558606564.5987117) [('tile_bna', 1), ('tile_bnb', 16), ('tile_t1', [64, 1]), ('tile_t2', [32, 2]), ('c_unroll', [16, 4]), ('yt', 2)],winograd,None,7545
DEBUG:autotvm:No: 16 GFLOPS: 0.00/6.20 result: MeasureResult(costs=('',), error_no=7, all_cost=5, timestamp=1558606564.5987988) [('tile_bna', 2), ('tile_bnb', 8), ('tile_t1', [8, 8]), ('tile_t2', [2, 32]), ('c_unroll', [32, 2]), ('yt', 1)],winograd,None,2191
DEBUG:autotvm:No: 17 GFLOPS: 0.00/6.20 result: MeasureResult(costs=('',), error_no=7, all_cost=5, timestamp=1558606564.6042001) [('tile_bna', 8), ('tile_bnb', 1), ('tile_t1', [64, 1]), ('tile_t2', [2, 32]), ('c_unroll', [64, 1]), ('yt', 4)],winograd,None,10678
DEBUG:autotvm:No: 18 GFLOPS: 0.00/6.20 result: MeasureResult(costs=('',), error_no=7, all_cost=5, timestamp=1558606564.6101496) [('tile_bna', 8), ('tile_bnb', 2), ('tile_t1', [1, 64]), ('tile_t2', [8, 8]), ('c_unroll', [64, 1]), ('yt', 8)],winograd,None,15383
DEBUG:autotvm:No: 19 GFLOPS: 0.00/6.20 result: MeasureResult(costs=('',), error_no=7, all_cost=5, timestamp=1558606564.6102304) [('tile_bna', 2), ('tile_bnb', 1), ('tile_t1', [1, 64]), ('tile_t2', [16, 4]), ('c_unroll', [32, 2]), ('yt', 16)],winograd,None,21326
DEBUG:autotvm:No: 20 GFLOPS: 0.00/6.20 result: MeasureResult(costs=('',), error_no=7, all_cost=5, timestamp=1558606564.6102855) [('tile_bna', 4), ('tile_bnb', 16), ('tile_t1', [1, 64]), ('tile_t2', [1, 64]), ('c_unroll', [8, 8]), ('yt', 8)],winograd,None,19597
DEBUG:autotvm:XGB load 20 entries from history log file
Traceback (most recent call last):
File "tutorials/autotvm/tune_relay_mobile_gpu.py", line 358, in <module>
tune_and_evaluate(tuning_option)
File "tutorials/autotvm/tune_relay_mobile_gpu.py", line 314, in tune_and_evaluate
tune_tasks(tasks, **tuning_opt)
File "tutorials/autotvm/tune_relay_mobile_gpu.py", line 295, in tune_tasks
autotvm.callback.log_to_file(tmp_log_file)])
File "/home/SERILOCAL/n.perto/Documents/tvm/python/tvm/autotvm/tuner/xgboost_tuner.py", line 86, in tune
super(XGBTuner, self).tune(*args, **kwargs)
File "/home/SERILOCAL/n.perto/Documents/tvm/python/tvm/autotvm/tuner/tuner.py", line 108, in tune
measure_batch = create_measure_batch(self.task, measure_option)
File "/home/SERILOCAL/n.perto/Documents/tvm/python/tvm/autotvm/measure/measure.py", line 252, in create_measure_batch
attach_objects = runner.set_task(task)
File "/home/SERILOCAL/n.perto/Documents/tvm/python/tvm/autotvm/measure/measure_methods.py", line 212, in set_task
raise RuntimeError("Cannot get remote devices from the tracker. "
RuntimeError: Cannot get remote devices from the tracker. Please check the status of tracker by 'python -m tvm.exec.query_rpc_tracker --port [THE PORT YOU USE]' and make sure you have free devices on the queue status.
I am running it locally with port forwarding, as mentioned here, I cannot connect the devices on the same network so that is the only option for me.
Querying the tracker just after the crash outputs:
Tracker address localhost:9190
Server List
----------------------------
server-address key
----------------------------
----------------------------
Queue Status
---------------------------
key total free pending
---------------------------
0 0 18
---------------------------
And after a little while
Tracker address localhost:9190
Server List
----------------------------
server-address key
----------------------------
127.0.0.1:43289 server:
----------------------------
Queue Status
---------------------------
key total free pending
---------------------------
1 1 0
---------------------------
Do you have any idea what can be the cause of the problem and what can I do to solve it?
Thanks