How to tune CNN networks with multiple gpu devices?

nicklhy · December 3, 2018, 11:50am

Hi, I noticed that the context is always set to gpu 0 in this tutorial like below:

ctx = tvm.context(str(target), 0)

How can I use all gpus availiable to accelerate the tuning process?

merrymercy · December 11, 2018, 3:59am

You have to use RPC tracker and RPC server mode.

You can register one server for one GPU, something like

CUDA_VISIBLE_DEVICES=0 python3 -m tvm.exec.rpc_server --key titanx --tracker ...
CUDA_VISIBLE_DEVICES=1 python3 -m tvm.exec.rpc_server --key titanx --tracker ...

nicklhy · December 19, 2018, 12:30pm

Thanks for your reply. I just tried this method recently but got a few more problems now.

I launched the auto-tune script in one of my computing server with 8 titan-v gpus. I found that all the cpu cores are utilized while the gpu devices are almost not used at all during the entire tuning process.
I launched the auto-tune script in my master node (i.e. server A). There are 3 other servers registered as computing node (python -m tvm.exec.rpc_server --tracker=${HOST_IP}:9190 --key=titanv). This time, cpu utilization=100%, gpu utilization=0% in master node. cpu utilization=0%, gpu utilization=0% in all the 3 computing nodes.
There are some timeout error messages displayed in both of the two experiments above.

INFO:RPCServer:Timeout in RPC session, kill..
INFO:RPCServer:connection from ('172.16.18.5', 44906)
INFO:RPCServer:load_module /tmp/tmp80qs9j9c/tmp_func_3495b12be1f98a20.tar
INFO:RPCServer:Timeout in RPC session, kill..
INFO:RPCServer:connection from ('172.16.18.5', 44930)
INFO:RPCServer:load_module /tmp/tmp5sgzmkfz/tmp_func_ad070c06fc932366.tar
INFO:RPCServer:Timeout in RPC session, kill..
INFO:RPCServer:connection from ('172.16.18.5', 44954)
INFO:RPCServer:load_module /tmp/tmpr_a7v68l/tmp_func_29ffa46b8cce806d.tar
INFO:RPCServer:Timeout in RPC session, kill..
INFO:RPCServer:connection from ('172.16.18.5', 45010)
INFO:RPCServer:load_module /tmp/tmplyvzz6fl/tmp_func_9852b87c445c798.tar
INFO:RPCServer:Timeout in RPC session, kill..
INFO:RPCServer:connection from ('172.16.18.5', 45116)
INFO:RPCServer:load_module /tmp/tmpir7h6rw0/tmp_func_22ec387d451f013d.tar
INFO:RPCServer:Timeout in RPC session, kill..
INFO:RPCServer:connection from ('172.16.18.5', 45142)
INFO:RPCServer:load_module /tmp/tmp0wfbyz3h/tmp_func_46d1b12b7dc698e1.tar
INFO:RPCServer:Timeout in RPC session, kill..
INFO:RPCServer:connection from ('172.16.18.5', 45164)

merrymercy · December 19, 2018, 1:51pm

The low utilization of GPU devices is expected as they are only used for quick measurement (depends on your measurement setting).

Timeout is also expected. But if there are too many timeouts, you can consider setting timeout higher in the https://github.com/dmlc/tvm/blob/001ab52509ce1f43fcbdad4d11c1ef2bcad04e10/tutorials/autotvm/tune_nnvm_cuda.py#L372

If you see speedup (or all servers output serving messages) and reasonable GFLOPS during tuning, it is fine. We cannot get linear scaling by using multiple machines.