Running error of tune_conv2d_cuda example

Terex12 · November 18, 2019, 9:26pm

Hi,

I attempt to use cuda auto tuning by running tune_conv2d_cuda.py in tutorial section. However, it does not stop. When I manually stop the process, it returns “ConnectionRefusedError: [Errno 111] Connection refused”. I build the TVM by using LLVM-10 and GCC 6.3 and turn on the cuda and LLVM flag and turn off the VTA flags. I run the code by suing python 3.5. The hardware is Tesla P100. How can I solve this issue? Is it related to RPC?

I list the HW info, freezing image and error image below.

HW info:

Freezing: Error msg:

Hzfengsy · November 19, 2019, 1:13am

You’d better change RPCRunner into LocalRunner if you just want to use local GPUs. Another solution is turn on RPC server, see details at https://docs.tvm.ai/tutorials/autotvm/tune_relay_cuda.html#scale-up-measurement-by-using-multiple-devices

Terex12 · November 19, 2019, 1:21am

I think the tune_conv2d_cuda.py is using localrunner in line 190. " measure_option = autotvm.measure_option( builder=autotvm.LocalBuilder(), runner=autotvm.LocalRunner(repeat=3, min_repeat_ms=100, timeout=4) ) " Am I correct?

Hzfengsy · November 19, 2019, 1:50am

Sorry, I have misunderstood the problem. It seems that you have finished the tuning and got a best config. The only problem may be the rpc sever will not exit. I think it will not influence the result, so it is ok to kill it after the final output.

Terex12 · November 19, 2019, 8:33pm

I think the issue is related to python version I ran. I used to use python 3.5.2 and programs froze because of non-join threads. When I switch to 3.6.3, issue is gone.