VTA autotuning from tutorial fails to evaluate the tuned network

Dear tvm community,

I am trying to follow the VTA auto-tuning tutorial, using the master as of 7 Oct. (76c2392).

I have used two PYNQs* and I am getting the following output:

did@tinizong:~/tvm$ python3 tutorials/autotvm/tune_relay_vta.py 
Extract tasks...
/home/did/tvm/python/tvm/autotvm/task/relay_integration.py:128: UserWarning: Invalid shape during AutoTVM task creation
  warnings.warn("Invalid shape during AutoTVM task creation")
Extracted 10 conv2d tasks:
(1, 14, 14, 256, 512, 1, 1, 0, 0, 2, 2)
(1, 28, 28, 128, 256, 1, 1, 0, 0, 2, 2)
(1, 56, 56, 64, 128, 1, 1, 0, 0, 2, 2)
(1, 56, 56, 64, 64, 3, 3, 1, 1, 1, 1)
(1, 28, 28, 128, 128, 3, 3, 1, 1, 1, 1)
(1, 56, 56, 64, 128, 3, 3, 1, 1, 2, 2)
(1, 14, 14, 256, 256, 3, 3, 1, 1, 1, 1)
(1, 28, 28, 128, 256, 3, 3, 1, 1, 2, 2)
(1, 7, 7, 512, 512, 3, 3, 1, 1, 1, 1)
(1, 14, 14, 256, 512, 3, 3, 1, 1, 2, 2)
Tuning...
[Task  1/10]  Current/Best:   17.01/  28.87 GFLOPS | Progress: (480/1000) | 432.89 s Done.
[Task  2/10]  Current/Best:    0.00/  31.51 GFLOPS | Progress: (576/1000) | 497.62 s Done.
[Task  3/10]  Current/Best:    1.82/  43.42 GFLOPS | Progress: (1000/1000) | 991.62 s Done.
[Task  4/10]  Current/Best:   13.38/  46.87 GFLOPS | Progress: (1000/1000) | 891.48 s Done.
[Task  5/10]  Current/Best:    5.07/  39.03 GFLOPS | Progress: (1000/1000) | 1039.65 s Done.
[Task  6/10]  Current/Best:    0.55/  44.61 GFLOPS | Progress: (1000/1000) | 943.45 s Done.
[Task  7/10]  Current/Best:    0.00/  40.45 GFLOPS | Progress: (1000/1000) | 1080.38 s Done.
[Task  8/10]  Current/Best:    0.00/   9.57 GFLOPS | Progress: (1000/1000) | 1777.26 s Done.
[Task  9/10]  Current/Best:    3.53/  12.58 GFLOPS | Progress: (1000/1000) | 2305.25 s Done.
[Task 10/10]  Current/Best:    0.37/  12.02 GFLOPS | Progress: (480/1000) | 709.03 s Done.
Compile...
Upload...
Traceback (most recent call last):

  File "tutorials/autotvm/tune_relay_vta.py", line 424, in <module>
    tune_and_evaluate(tuning_option)

  File "tutorials/autotvm/tune_relay_vta.py", line 402, in tune_and_evaluate
    remote.upload(temp.relpath("graphlib.o"))

  File "/home/did/tvm/python/tvm/rpc/client.py", line 102, in upload
    self._remote_funcs["upload"](target, blob)

  File "tvm/_ffi/_cython/./function.pxi", line 310, in tvm._ffi._cy3.core.FunctionBase.__call__

  File "tvm/_ffi/_cython/./function.pxi", line 245, in tvm._ffi._cy3.core.FuncCall

  File "tvm/_ffi/_cython/./function.pxi", line 234, in tvm._ffi._cy3.core.FuncCall3

  File "tvm/_ffi/_cython/./base.pxi", line 171, in tvm._ffi._cy3.core.CALL

tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (3) /home/did/tvm/build/libtvm.so(TVMFuncCall+0x61) [0x7ff275089f51]
  [bt] (2) /home/did/tvm/build/libtvm.so(std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::RPCModuleNode::WrapRemote(void*)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0x3b) [0x7ff2750e04fb]
  [bt] (1) /home/did/tvm/build/libtvm.so(tvm::runtime::RPCSession::CallFunc(void*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*, tvm::runtime::PackedFunc const*)+0x154) [0x7ff2750e8ca4]
  [bt] (0) /home/did/tvm/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x32) [0x7ff2748c91b2]
  File "/home/did/tvm/src/runtime/rpc/rpc_session.cc", line 962
TVMError: Check failed: code == RPCCode: :kReturn: code=4

So, it seems that I am able to get the autotuning work, but I cannot evaluate the tuned network.
I have used the cython as FFI of TVM with:

pip3 install --user cython
sudo make cython3 

Could it be the problem since it seems that an error is encountered in tvm/_ffi/_cython ?

Kind regards,
Dionysios

*There is an issue when I use two PYNQs, that one PYNQ is idle for 1 hour, which is described here

This one is a little odd to me; have you worked around this issue, or are you still dealing with the problem? It should be relatively easy to resume the tutorial example to the inference portion by skipping tuning and just loading the schedules. Let me know!