Autotuning: Error with RPC Tracker

Hi, I’m running tutorial tune_relay_cuda.py with local runner rather than default RPC runner.

@ tune_relay_cuda.py

‘measure_option’: autotvm.measure_option(

    builder=autotvm.LocalBuilder(timeout=10),
    runner=autotvm.LocalRunner(number=20, repeat=3, timeout=4, min_repeat_ms=150)
    #runner=autotvm.RPCRunner(
    #    '1080ti',  # change the device key to your key
    #    '0.0.0.0', 9190,
    #    number=20, repeat=3, timeout=4, min_repeat_ms=150)
    )

However, I’m facing the following errors repeatedly during the single tuning process. Obviously, the tuning does not stop with a single error and produce the same error multiple times.

WARNING:RPCTracker:Invalid connection from TCPSocket: ('198.108.67.48', 35266)
ERROR:asyncio:Exception in callback None()
handle: <Handle cancelled>
Traceback (most recent call last):
  File "/usr/lib/python3.6/asyncio/events.py", line 145, in _run
    self._callback(*self._args)
  File "/home/sunggg/.local/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 139, in _handle_events
    handler_func(fileobj, events)
  File "/home/sunggg/Projects/tvm/python/tvm/rpc/tornado_util.py", line 38, in _event_handler
    self._event_handler(events)
  File "/home/sunggg/Projects/tvm/python/tvm/rpc/tornado_util.py", line 75, in _event_handler
    if self._update_read() and (events & self._ioloop.WRITE):
  File "/home/sunggg/Projects/tvm/python/tvm/rpc/tornado_util.py", line 112, in _update_read
    self.on_message(msg)
  File "/home/sunggg/Projects/tvm/python/tvm/rpc/tracker.py", line 198, in on_message
    self._init_conn(message)
  File "/home/sunggg/Projects/tvm/python/tvm/rpc/tracker.py", line 181, in _init_conn
    magic = struct.unpack('<i', message)[0]
struct.error: unpack requires a buffer of 4 bytes

WARNING:RPCTracker:Invalid connection from TCPSocket: ('172.104.84.223', 53804)
 .... repeated message...

Can anyone help me any hint or suggestion how to fix this or what may cause this issue?

AFAIK, RPC is the module to collect results from multiple devices, so not sure why I’m getting this with the local runner and whether I can ignore this erros.

Appreciate your help.

LocalRunner will launch a local RPC server anyway so this is expected. From my experience this warning is fine as long as your tuning logs look fine.

1 Like

Thank you for sharing your experience!

When I check the log, it seems to stop tuning the kernel when that error occurs and move onto the next kernel. So, I guess it ends up returning sub-optimal setting although it may find the better one without the error. (+ For experiment, I disabled early stopping.)

Any experience or thoughts what may cause this error? Appreciate your help.

So you set n_trial to a number and found that there’s no enough number of tuned configs in the log file after this warning? The tuning should not stop even after this warning if I remember correctly tho.

1 Like

This is what I’m observing. I set n_trial as 1000.

[Task 9/16] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/1000) | 0.00 sterminate called without an active exception

[Task 11/16] Current/Best: 194.60/ 240.15 GFLOPS | Progress: (824/1000) | 2121.35 sWARNING:RPCTracker:TCPSocket: (‘104.152.52.25’, 42745): Error in RPC Tracker: [Errno 104] Connection reset by peer

Other tasks are completed without any error.

Never seen the exception at Task 9. For Task 11, usually you will see the tuning continues going and the progress will show in the next line (so you will see two [Task 11/16] lines on console). If these are only tasks that encounter problems, you can re-tune only these two to save the time.

1 Like

Oh, gotcha. Thank you for the clarification. Since I’m new to this framework, I wanted to double check with the someone with experience before I report this number. Again, appreciate it.

Hi @comaniac It seems you are superior in tvm. Can you help me to check an error with https://discuss.tvm.ai/t/relay-opencl-inference-result-error/5965, Thanks a lot!

Hi @comaniac, I have encountered a situation which is as the following in win-os. Can you help to resolve this issue,thanks.

2023-11-23 19:13:34.298 WARNING Invalid connection from TCPSocket: (‘127.0.0.1’, 50179) ERROR:asyncio:Exception in callback AddThreadSelectorEventLoop._handle_select([<socket.socke…0.1’, 50087)>, 1416], []) handle: <Handle AddThreadSelectorEventLoop._handle_select([<socket.socke…0.1’, 50087)>, 1416], [])> Traceback (most recent call last): File “C:\Python38\lib\asyncio\events.py”, line 81, in _run self._context.run(self._callback, *self._args) File “C:\Python38\lib\site-packages\tornado\platform\asyncio.py”, line 647, in _handle_select self._handle_event(r, self._readers) File “C:\Python38\lib\site-packages\tornado\platform\asyncio.py”, line 661, in _handle_event callback() File “C:\Python38\lib\site-packages\tornado\platform\asyncio.py”, line 206, in _handle_events handler_func(fileobj, events) File “C:\Python38\lib\site-packages\tvm-0.12.0.dev16+gde9f365df-py3.8-win-amd64.egg\tvm\rpc\tornado_util.py”, line 41, in _event_handler self._event_handler(events) File “C:\Python38\lib\site-packages\tvm-0.12.0.dev16+gde9f365df-py3.8-win-amd64.egg\tvm\rpc\tornado_util.py”, line 79, in _event_handler if self._update_read() and (events & self._ioloop.WRITE): File “C:\Python38\lib\site-packages\tvm-0.12.0.dev16+gde9f365df-py3.8-win-amd64.egg\tvm\rpc\tornado_util.py”, line 118, in _update_read self.on_message(msg) File “C:\Python38\lib\site-packages\tvm-0.12.0.dev16+gde9f365df-py3.8-win-amd64.egg\tvm\rpc\tracker.py”, line 215, in on_message self._init_conn(message) File “C:\Python38\lib\site-packages\tvm-0.12.0.dev16+gde9f365df-py3.8-win-amd64.egg\tvm\rpc\tracker.py”, line 198, in _init_conn magic = struct.unpack("<i", message)[0] struct.error: unpack requires a buffer of 4 bytes