RPC tracker could not start scheduler


#1

Hi,

I am following this tutorial https://docs.tvm.ai/tutorials/autotvm/tune_nnvm_arm.html?highlight=arm and got stuck while trying to get RPC tracker to work. I can find my device on the server list through tracker query but it won’t show up in queue status.

Upon some further digging in the code, I found that each time a new device is connected to the server, it sent 2 ‘update’ requests to update the server list. But isn’t it supposed to send 1 ‘put’ request to trigger the scheduler and 1 ‘update’ request?

Android version: 6.0.1

Any help will be greatly appreciated. Thank you!


#2

What is the output of python -m tvm.exec.query_rpc_tracker python -m tvm.exec.query_rpc_tracker --host=[host] --port[port], and what are the settings you are using on your Android device?


#3

Server List
----------------------------
server-address key
----------------------------
10.73.194.143:47290 server:android
10.73.186.140:54754 server:android1
----------------------------

Queue Status
----------------------------
key free pending
----------------------------

Here’s my query_rpc_tracker output. I have two devices runing RPC app in the same local network. My settings for the app are:
Address: 10.46.144.59
Port: 9190
Key: android
Keep RPC Alive enabled

Here’s how I ran android tracker in the background:
python3 -m tvm.exec.rpc_tracker --host=10.46.144.59 --port 9190 --no-fork

Thanks


#4

If possible, can you check the output of logcat (grep for System.err, and tvm) after you process (Start RPC) or the RPC Activity appears on the screen?
One thing to double check: are all your devices on the same subnet, and are there any virtual machines involved in the process that would complicate the networking situation?

It shouldn’t affect this case, but you can also try updating the RPC app as there was a recent update.


#5

Hi eqy,

Here’s my logcat output
08-22 16:07:09.107 20933 20945 W System.err: java.net.SocketException: recvfrom failed: ETIMEDOUT (Connection timed out)
08-22 16:07:09.107 20933 20945 W System.err: at libcore.io.IoBridge.maybeThrowAfterRecvfrom(IoBridge.java:588)
08-22 16:07:09.107 20933 20945 W System.err: at libcore.io.IoBridge.recvfrom(IoBridge.java:552)
08-22 16:07:09.107 20933 20945 W System.err: at java.net.PlainSocketImpl.read(PlainSocketImpl.java:481)
08-22 16:07:09.107 20933 20945 W System.err: at java.net.PlainSocketImpl.access$000(PlainSocketImpl.java:37)
08-22 16:07:09.107 20933 20945 W System.err: at java.net.PlainSocketImpl$PlainSocketInputStream.read(PlainSocketImpl.java:237)
08-22 16:07:09.107 20933 20945 W System.err: at ml.dmlc.tvm.rpc.Utils.recvAll(Utils.java:35)
08-22 16:07:09.107 20933 20945 W System.err: at ml.dmlc.tvm.rpc.Utils.recvString(Utils.java:83)
08-22 16:07:09.107 20933 20945 W System.err: at ml.dmlc.tvm.rpc.ConnectTrackerServerProcessor.register(ConnectTrackerServerProcessor.java:223)
08-22 16:07:09.107 20933 20945 W System.err: at ml.dmlc.tvm.rpc.ConnectTrackerServerProcessor.run(ConnectTrackerServerProcessor.java:109)
08-22 16:07:09.107 20933 20945 W System.err: at ml.dmlc.tvm.tvmrpc.RPCProcessor.run(RPCProcessor.java:67)
08-22 16:07:09.107 20933 20945 W System.err: Caused by: android.system.ErrnoException: recvfrom failed: ETIMEDOUT (Connection timed out)
08-22 16:07:09.107 20933 20945 W System.err: at libcore.io.Posix.recvfromBytes(Native Method)
08-22 16:07:09.107 20933 20945 W System.err: at libcore.io.Posix.recvfrom(Posix.java:189)
08-22 16:07:09.107 20933 20945 W System.err: at libcore.io.BlockGuardOs.recvfrom(BlockGuardOs.java:250)
08-22 16:07:09.107 20933 20945 W System.err: at libcore.io.IoBridge.recvfrom(IoBridge.java:549)

I am using docker for this and here’s the ifconfig result:
docker0 Link encap:Ethernet HWaddr 02:42:d9:d3:2a:58
inet addr:172.17.0.1 Bcast:0.0.0.0 Mask:255.255.0.0
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

enp0s25 Link encap:Ethernet HWaddr d8:9e:f3:1e:55:fc
inet addr:10.46.144.59 Bcast:10.46.145.255 Mask:255.255.254.0
inet6 addr: 2002:c023:9c17:431:da9e:f3ff:fe1e:55fc/64 Scope:Global
inet6 addr: fe80::da9e:f3ff:fe1e:55fc/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:26061075 errors:0 dropped:84 overruns:0 frame:0
TX packets:23243491 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:11712760350 (11.7 GB) TX bytes:24302367020 (24.3 GB)
Interrupt:20 Memory:f7100000-f7120000

I am able to ping android device from docker and given the fact that RPC tracker has the list of devices that are connected to it, does it mean the mutual communication is working?

Thanks,


#6

I do not know if ping is enough to guarantee that mutual communication is working, since the protocol depends on specific ports being accessible. Is there any way to try running the tracker in a different environment?