How to tune CNN networks with multiple gpu devices?


#1

Hi, I noticed that the context is always set to gpu 0 in this tutorial like below:

ctx = tvm.context(str(target), 0)

How can I use all gpus availiable to accelerate the tuning process?


#2

You have to use RPC tracker and RPC server mode.

You can register one server for one GPU, something like

CUDA_VISIBLE_DEVICES=0 python3 -m tvm.exec.rpc_server --key titanx --tracker ...
CUDA_VISIBLE_DEVICES=1 python3 -m tvm.exec.rpc_server --key titanx --tracker ...

#3

Thanks for your reply. I just tried this method recently but got a few more problems now.

  1. I launched the auto-tune script in one of my computing server with 8 titan-v gpus. I found that all the cpu cores are utilized while the gpu devices are almost not used at all during the entire tuning process.

  2. I launched the auto-tune script in my master node (i.e. server A). There are 3 other servers registered as computing node (python -m tvm.exec.rpc_server --tracker=${HOST_IP}:9190 --key=titanv). This time, cpu utilization=100%, gpu utilization=0% in master node. cpu utilization=0%, gpu utilization=0% in all the 3 computing nodes.

  3. There are some timeout error messages displayed in both of the two experiments above.

INFO:RPCServer:Timeout in RPC session, kill..
INFO:RPCServer:connection from ('172.16.18.5', 44906)
INFO:RPCServer:load_module /tmp/tmp80qs9j9c/tmp_func_3495b12be1f98a20.tar
INFO:RPCServer:Timeout in RPC session, kill..
INFO:RPCServer:connection from ('172.16.18.5', 44930)
INFO:RPCServer:load_module /tmp/tmp5sgzmkfz/tmp_func_ad070c06fc932366.tar
INFO:RPCServer:Timeout in RPC session, kill..
INFO:RPCServer:connection from ('172.16.18.5', 44954)
INFO:RPCServer:load_module /tmp/tmpr_a7v68l/tmp_func_29ffa46b8cce806d.tar
INFO:RPCServer:Timeout in RPC session, kill..
INFO:RPCServer:connection from ('172.16.18.5', 45010)
INFO:RPCServer:load_module /tmp/tmplyvzz6fl/tmp_func_9852b87c445c798.tar
INFO:RPCServer:Timeout in RPC session, kill..
INFO:RPCServer:connection from ('172.16.18.5', 45116)
INFO:RPCServer:load_module /tmp/tmpir7h6rw0/tmp_func_22ec387d451f013d.tar
INFO:RPCServer:Timeout in RPC session, kill..
INFO:RPCServer:connection from ('172.16.18.5', 45142)
INFO:RPCServer:load_module /tmp/tmp0wfbyz3h/tmp_func_46d1b12b7dc698e1.tar
INFO:RPCServer:Timeout in RPC session, kill..
INFO:RPCServer:connection from ('172.16.18.5', 45164)

#4

The low utilization of GPU devices is expected as they are only used for quick measurement (depends on your measurement setting).

Timeout is also expected. But if there are too many timeouts, you can consider setting timeout higher in the https://github.com/dmlc/tvm/blob/001ab52509ce1f43fcbdad4d11c1ef2bcad04e10/tutorials/autotvm/tune_nnvm_cuda.py#L372

If you see speedup (or all servers output serving messages) and reasonable GFLOPS during tuning, it is fine. We cannot get linear scaling by using multiple machines.