[AutoTVM] Cannot get local GPU in offline environment


#1

This problem is caused when I run the tune_conv2d_cuda.py (a tutorial ) in offline enviroment. I don’t change code and the default auto-tuner is about local gpu. However, I can run tune_relay_x86.py(auto tune in cpu) and opt_conv_cuda.py(manual opt) in this offline environment.

My environment:
offline
GPU: 3*K80
os: ubuntu 16.04

The details error info:
ConfigSpace (len=10454400, space_map=
0 tile_f: Split(policy=all, product=512, num_outputs=4) len=220
1 tile_y: Split(policy=all, product=7, num_outputs=4) len=4
2 tile_x: Split(policy=all, product=7, num_outputs=4) len=4
3 tile_rc: Split(policy=all, product=512, num_outputs=3) len=55
4 tile_ry: Split(policy=all, product=3, num_outputs=3) len=3
5 tile_rx: Split(policy=all, product=3, num_outputs=3) len=3
6 auto_unroll_max_step: OtherOption([0, 512, 1500]) len=3
7 unroll_explicit: OtherOption([0, 1]) len=2
)
Traceback (most recent call last):
File “tune_conv2d_cuda.py”, line 201, in
callbacks=[autotvm.callback.log_to_file(‘conv2d.log’)])
File “/home/dl/tvm/python/tvm/autotvm/tuner/xgboost_tuner.py”, line 86, in tune
super(XGBTuner, self).tune(*args, **kwargs)
File “/home/dl/tvm/python/tvm/autotvm/tuner/tuner.py”, line 108, in tune
measure_batch = create_measure_batch(self.task, measure_option)
File “/home/dl/tvm/python/tvm/autotvm/measure/measure.py”, line 252, in create_measure_batch
attach_objects = runner.set_task(task)
File “/home/dl/tvm/python/tvm/autotvm/measure/measure_methods.py”, line 342, in set_task
super(LocalRunner, self).set_task(task)
File “/home/dl/tvm/python/tvm/autotvm/measure/measure_methods.py”, line 212, in set_task
raise RuntimeError("Cannot get remote devices from the tracker. "
RuntimeError: Cannot get remote devices from the tracker. Please check the status of tracker by ‘python -m tvm.exec.query_rpc_tracker --port [THE PORT YOU USE]’ and make sure you have free devices on the queue status.


#2

You can try manually starting an RPC tracker and server to troubleshoot. Switch the LocalRunner code to RPCRunner in the tutorial and manually start a tracker and server with something like
python3 -m tvm.exec.rpc_tracker --port 9190
python3 -m tvm.exec.rpc_server --tracker 0.0.0.0:9190 --key k80. Note that you should run these commands in separate, persistent shells.


#3

Thank you very much.
Now the RPC connection was established, but I changed the code and still got the above erro and then I run the following command, still can not get the device info.

code change:

python3 -m tvm.exec.query_rpc_tracker


#4

What command did you use to start the tracker?