VTA autotuning from tutorial fails with one PYNQ, but succeeds with two PYNQs


#1

Dear tvm community,

I am trying to follow the VTA auto-tuning tutorial, using the master as of 7 Oct. (76c2392).

I am facing the following issue:

  • I have two PYNQs. When I try to use only one PYNQ, I get an error that my device is not tracked:

    Extracted 10 conv2d tasks:  
    (1, 14, 14, 256, 512, 1, 1, 0, 0, 2, 2)
    (1, 28, 28, 128, 256, 1, 1, 0, 0, 2, 2)
    (1, 56, 56, 64, 128, 1, 1, 0, 0, 2, 2)
    (1, 56, 56, 64, 64, 3, 3, 1, 1, 1, 1)
    (1, 28, 28, 128, 128, 3, 3, 1, 1, 1, 1)
    (1, 56, 56, 64, 128, 3, 3, 1, 1, 2, 2)
    (1, 14, 14, 256, 256, 3, 3, 1, 1, 1, 1)
    (1, 28, 28, 128, 256, 3, 3, 1, 1, 2, 2)
    (1, 7, 7, 512, 512, 3, 3, 1, 1, 1, 1)
    (1, 14, 14, 256, 512, 3, 3, 1, 1, 2, 2)
    Tuning...
    [Task  1/10]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/1000) | 0.00 sTraceback (most recent call last):
    
    File "tutorials/autotvm/tune_relay_vta.py", line 424, in <module>
      tune_and_evaluate(tuning_option)
    
    File "tutorials/autotvm/tune_relay_vta.py", line 381, in tune_and_evaluate
      tune_tasks(tasks, **tuning_opt)
    
    File "tutorials/autotvm/tune_relay_vta.py", line 285, in tune_tasks
      autotvm.callback.log_to_file(tmp_log_file)])
    
    File "/home/did/tvm/python/tvm/autotvm/tuner/tuner.py", line 108, in tune
      measure_batch = create_measure_batch(self.task, measure_option)
    
    File "/home/did/tvm/python/tvm/autotvm/measure/measure.py", line 252, in create_measure_batch
      attach_objects = runner.set_task(task)
    
    File "/home/did/tvm/python/tvm/autotvm/measure/measure_methods.py", line 211, in set_task
      raise RuntimeError("Cannot get remote devices from the tracker. "
    
    RuntimeError: Cannot get remote devices from the tracker. Please check the status of tracker by     'python -m tvm.exec.query_rpc_tracker --port [THE PORT YOU USE]' and make sure you have free devices on the queue status.
    
    

    However I have successfully verified that the device is tracked :
    python3 -m tvm.exec.query_rpc_tracker --host=0.0.0.0 --port=9190

    Tracker address 0.0.0.0:9190
    
    Server List
    ----------------------------  
    server-address  key
    ----------------------------
    192.168.2.98:44802      server:pynq
    ----------------------------
    
    Queue Status
    ----------------------------
    key    total  free  pending
    ----------------------------
    pynq   1      1     0
    ----------------------------
    

    … and I have verified that I can execute the basic conv2d in both PYNQs (so the end-to-end flow works)
    python tvm/vta/tests/python/integration/test_benchmark_topi_conv2d.py

    When I use both PYNQs, then I am able to run the tutorial* (i.e. I am not getting the previous error). However it is weird that in the first autotuning phase, most of the work is taking place in one PYNQ and the other is idle. After a timeout (which happens after 1 hour) I am getting this report on the ‘idle’ ZYNQ:

    xilinx@pynq:~/tvm$ sudo ./apps/vta_rpc/start_rpc_server_to_tracker.py
    INFO:RPCServer:bind to 0.0.0.0:9091
    INFO:RPCServer:connection from ('192.168.2.1', 43478)
    INFO:root:Skip reconfig_runtime due to same config.
    INFO:root:Program FPGA with 1x16_i8w8a32_15_15_18_17.bit 
    
    INFO:RPCServer:Timeout in RPC session, kill..
    INFO:RPCServer:connection from ('192.168.2.1', 56800)
    INFO:root:Program FPGA with 1x16_i8w8a32_15_15_18_17.bit 
    INFO:root:Skip reconfig_runtime due to same config.
    INFO:root:Loading VTA library: /home/xilinx/tvm/vta/python/vta/../../../build/libvta.so
    INFO:RPCServer:load_module /tmp/tmpwl6adhvf/tmp_func_1dbb6a8d7de86cfb.tar
    INFO:RPCServer:Finish serving ('192.168.2.1', 56800)
    INFO:RPCServer:connection from ('192.168.2.1', 56824)
    INFO:root:Program FPGA with 1x16_i8w8a32_15_15_18_17.bit 
    ... and a long log of same lines continues ....
    

I am summarizing my questions:

  • When I use multiple PYNQs, do I need to declare, in PYNQs or in host, any environment variables (or other configuration) or just registering the PYNQs to the host/tracker with this command is ok :
    xilinx@pynq:~/tvm$ sudo ./apps/vta_rpc/start_rpc_server_to_tracker.py ? (I have changed the ip in this file for the host)
  • Why when I use two PYNQs the tutorial succeeds* (even with this initial 1-hour phase that one PYNQ is idle), while when I use only one PYNQ (tested with both), the tutorial fails?
  • When I use multiple PYNQs, do I need to change any configuration on the PYNQs, apart from their IP, e.g. hostname, or just flashing them with 2.4 image, is ok to go.

Kind regards,
Dionysios

*When I use two PYNQs I have an issue that the evaluation of the tuned network fails, which is described here.


VTA autotuning from tutorial fails to evaluate the tuned network
#2

Hi Dionysios,

Thanks for the detailed bug report. I try to echo your bullet list below:

  1. There is no need to set up any different environment variables on the pynqs. The registration should happen automatically with the tracker, and the latter can then managed the resource allocation.
  2. This is strange and shouldn’t happen. Could the other PYNQ be not set up correctly? Perhaps the start_rpc_server_to_tracker.py was not run with sudo privileges? Does it fail regardless of which PYNQ board is connected to the tracker? That can help us narrow down the bug.
  3. There is no need to change the configuration of the PYNQs, you should be good to go.

Thierry


#3

Hi @thierry, thanks a lot for your reply.

I was experimenting a lot with setups and configurations to debug this issue - with no luck.
So, the aforementioned problem still persists in my setup and I believe it is reproducible, since I am using the default configuration. So quickly replying to the notes above:

  1. My tracker is able to track the two PYNQ boards, no problem on connection.
  2. It fail regardless of which PYNQ board is connected to the tracker. What I’ve discovered after running several times is the follow:
    • PYNQ#1 is assigned the first workload and it gets frozen (if I leave it running it will expire with a timeout as listed above). PYNQ#2 is assigned workloads and completes the autotuning successfully.
    • After the previous successful execution (only using 1 PYNQ board), if I re-execute the same script the opposite it happening, i.e. PYNQ2 is frozen and PYNQ1 does the work.
    • So if I register only one device to the tracker, I cannot autotune, since 1 PYNQ is frozen and the atotuning script cannot find any available PYNQ to do the work. (When a PYNQ is frozen, the status on tracker is total:1 free:0 pending:1
    • If I register two PYNQs, my job is done since the status on tracker is total:2 free:1 pending:1, but obviously, autuning does not scale on the number of PYNQs, so I cannot speedup.
  3. Indeed, I just have the default configuration with installation instructions and tutorial so I believe the above is reproducible with just one PYNQ.

I would be grateful if you could guide me how to debug this issue.

Regards,
Dionysios