[autotvm][RPC error] VGG16 with input shape(768, 768) on ARM CPU

@kevinthesun @FrozenGene

I have a pytorch model based on vgg16, I can autotune with input shape (224, 224) But when I try to autotune with input shape 768x768, got error=7, according to https://github.com/apache/incubator-tvm/blob/master/python/tvm/autotvm/measure/measure.py#L64

RUN_TIMEOUT = 7 # timeout during run

I have latest tvm on host and latest tvm runtime on remote device.

From tuning log:

{"input": ["llvm -target=armv7l-linux-gnueabihf", "conv2d_NCHWc.x86", [["TENSOR", [1, 3, 768, 768], "float32"], ["TENSOR", [64, 3, 3, 3], "float32"], [1, 1], [1, 1, 1, 1], [1, 1], "NCHW", "NCHW", "float32"], {}], "config": {"index": 239, "code_hash": null, "entity": [["tile_ic", "sp", [-1, 3]], ["tile_oc", "sp", [-1, 1]], ["tile_ow", "sp", [-1, 8]], ["unroll_kw", "ot", false]]}, "result": [[1000000000.0], 7, 150, 1587243236.4542387], "version": 0.2, "tvm_version": "0.7.dev1"}

Tasks:

Task(func_name=conv2d_NCHWc.x86, args=(('TENSOR', (1, 128, 384, 384), 'float32'), ('TENSOR', (128, 128, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'NCHW', 'float32'), kwargs={}, workl$
ad=('conv2d_NCHWc.x86', ('TENSOR', (1, 128, 384, 384), 'float32'), ('TENSOR', (128, 128, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'NCHW', 'float32'))
Task(func_name=conv2d_NCHWc.x86, args=(('TENSOR', (1, 64, 384, 384), 'float32'), ('TENSOR', (128, 64, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'NCHW', 'float32'), kwargs={}, workloa$
=('conv2d_NCHWc.x86', ('TENSOR', (1, 64, 384, 384), 'float32'), ('TENSOR', (128, 64, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'NCHW', 'float32'))
Task(func_name=conv2d_NCHWc.x86, args=(('TENSOR', (1, 64, 768, 768), 'float32'), ('TENSOR', (64, 64, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'NCHW', 'float32'), kwargs={}, workload$
('conv2d_NCHWc.x86', ('TENSOR', (1, 64, 768, 768), 'float32'), ('TENSOR', (64, 64, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'NCHW', 'float32'))
Task(func_name=conv2d_NCHWc.x86, args=(('TENSOR', (1, 3, 768, 768), 'float32'), ('TENSOR', (64, 3, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'NCHW', 'float32'), kwargs={}, workload=($
conv2d_NCHWc.x86', ('TENSOR', (1, 3, 768, 768), 'float32'), ('TENSOR', (64, 3, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'NCHW', 'float32'))

Console output:

Tuning...
[Task  1/23]  Current/Best:    3.57/   6.06 GFLOPS | Progress: (336/336) | 3404.30 s Done.
[Task  2/23]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/336) | 0.00 sTraceback (most recent call last):

  File "tune_relay.py", line 270, in <module>
    tune_and_evaluate(tuning_option)

  File "tune_relay.py", line 228, in tune_and_evaluate
    tune_tasks(tasks, **tuning_opt)

  File "tune_relay.py", line 206, in tune_tasks
    autotvm.callback.log_to_file(tmp_log_file)])

  File "/home/workplace/tvm/python/tvm/autotvm/tuner/xgboost_tuner.py", line 90, in tune
    super(XGBTuner, self).tune(*args, **kwargs)

  File "/home/workplace/tvm/python/tvm/autotvm/tuner/tuner.py", line 108, in tune
    measure_batch = create_measure_batch(self.task, measure_option)

  File "/home/workplace/tvm/python/tvm/autotvm/measure/measure.py", line 253, in create_measure_batch
    attach_objects = runner.set_task(task)

  File "/home/workplace/tvm/python/tvm/autotvm/measure/measure_methods.py", line 215, in set_task
    raise RuntimeError("Cannot get remote devices from the tracker. "

RuntimeError: Cannot get remote devices from the tracker. Please check the status of tracker by 'python -m tvm.exec.query_rpc_tracker --port [THE PORT YOU USE]' and make sure you have free devices on the queue status.```

It seems to me that 768x768 is quite a large tensor. Would that cause RPC issues similar to Error while loading params to target in RPC session?

cc @FrozenGene @kevinthesun @merrymercy @tqchen