Auto-tuning a convolutional network for Mobile GPU doesn't work

Hello,

I’m trying to auto-tuning a pre-trained model - squeezenet_v1.1 but faced with same problem even other models(resnet, inceptionv3) - according to below guide,
https://docs.tvm.ai/tutorials/autotvm/tune_nnvm_mobile_gpu.html#sphx-glr-tutorials-autotvm-tune-nnvm-mobile-gpu-py

However, I see below many logs when I typed the command, “python3 tune_nnvm_mobile_gpu.py”
And it seems something wrong. Could you point me what is problem?

Thanks,
Inki Dae

Logs

daeinki@daeinki-linux:~/project/public/tvm_test$ python3 tune_nnvm_mobile_gpu.py
Extract tasks…
/home/daeinki/.local/lib/python3.5/site-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/tag.py:32: UserWarning: Tag ‘broadcast’ declared via TagScope was not used.
warnings.warn(“Tag ‘%s’ declared via TagScope was not used.” % (self.tag,))
DEBUG:root:lower function fuse_conv2d_relu
DEBUG:root:// attr [pad_temp] storage_scope = “global”
allocate pad_temp[float32 * 1 * 3 * 225 * 225]
// attr [compute] storage_scope = “global”
allocate compute[float32 * 802816]
// attr [tensor] storage_scope = “global”
allocate tensor[float32 * 64 * 1 * 1]

produce tensor {
for (ax1, 0, 1000) {
tensor[ax1] = (exp((input0[ax1] - tensor[0]))/tensor[0])
}
}

Tuning…

[Task 20/22] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (1/1) | 0.16 s Done.
DEBUG:autotvm:XGB load 16 entries from history log file
[Task 21/22] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/1) | 0.00 sINFO:autotvm:Get devices for measurement successfully!
DEBUG:autotvm:No: 1 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(RuntimeError(‘Except caught from RPC call: TVMCall CFunc Error:\nTraceback (most recent call last):\n File “/usr/lib/tvm/python/tvm/_ffi/_ctypes/function.py”, line 55, in cfun\n rv = local_pyfunc(*pyargs)\n File “/usr/lib/tvm/python/tvm/rpc/server.py”, line 50, in load_module\n m = _load_module(path)\n File “/usr/lib/tvm/python/tvm/module.py”, line 222, in load\n _cc.create_shared(path + “.so”, files)\n File “/usr/lib/tvm/python/tvm/contrib/cc.py”, line 33, in create_shared\n _linux_shared(output, objects, options, cc)\n File “/usr/lib/tvm/python/tvm/contrib/cc.py”, line 53, in _linux_shared\n cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)\n File “/usr/lib/python2.7/subprocess.py”, line 710, in init\n errread, errwrite)\n File “/usr/lib/python2.7/subprocess.py”, line 1327, in _execute_child\n raise child_exception\nOSError: [Errno 2] No such file or directory\n’,),), error_no=4, all_cost=1.1189663410186768, timestamp=1542185572.3326507) [(‘tile_co’, [4, 4]), (‘tile_oh’, [1, 55]), (‘tile_ow’, [11, 5]), (‘reorder_0’, [0, 1, 2, 3, 4, 5, 6, 9, 7, 8]), (‘ann_reduce’, [‘none’, ‘none’]), (‘ann_spatial’, [‘vec’, ‘none’, ‘unroll’])],direct,None,10997
[Task 21/22] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (1/1) | 1.34 s Done.
DEBUG:autotvm:XGB load 17 entries from history log file
[Task 22/22] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/1) | 0.00 sINFO:autotvm:Get devices for measurement successfully!
DEBUG:autotvm:No: 1 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(RuntimeError(‘Except caught from RPC call: TVMCall CFunc Error:\nTraceback (most recent call last):\n File “/usr/lib/tvm/python/tvm/_ffi/_ctypes/function.py”, line 55, in cfun\n rv = local_pyfunc(*pyargs)\n File “/usr/lib/tvm/python/tvm/rpc/server.py”, line 50, in load_module\n m = _load_module(path)\n File “/usr/lib/tvm/python/tvm/module.py”, line 222, in load\n _cc.create_shared(path + “.so”, files)\n File “/usr/lib/tvm/python/tvm/contrib/cc.py”, line 33, in create_shared\n _linux_shared(output, objects, options, cc)\n File “/usr/lib/tvm/python/tvm/contrib/cc.py”, line 53, in _linux_shared\n cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)\n File “/usr/lib/python2.7/subprocess.py”, line 710, in init\n errread, errwrite)\n File “/usr/lib/python2.7/subprocess.py”, line 1327, in _execute_child\n raise child_exception\nOSError: [Errno 2] No such file or directory\n’,),), error_no=4, all_cost=6.960962295532227, timestamp=1542185579.8855002) [(‘tile_co’, [32, 2]), (‘tile_oh’, [7, 16]), (‘tile_ow’, [7, 16]), (‘reorder_0’, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), (‘ann_reduce’, [‘unroll’, ‘unroll’]), (‘ann_spatial’, [‘unroll’, ‘vec’, ‘none’])],direct,None,83063
[Task 22/22] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (1/1) | 7.20 s Done.
DEBUG:autotvm:Finish loading 22 records
INFO:autotvm:Extract 0 best records from the odroid.squeezenet_v1.1.log.tmp
DEBUG:autotvm:Finish loading 0 records

Could you try skipping tuning, only run evaluation, does it work? Just to make sure whether it is a problem of tuning itself, or compiling / rpc setup problem.

basically, comment out these lines,

# run tuning tasks
print("Tuning...")
tune_tasks(tasks, **tuning_opt)

# compile kernels with history best records
with autotvm.apply_history_best(log_file):

Thanks for reply.

I looked into this problem and found out that toolchain wasn’t installed on the target device. However, after installing toolchain and relevant libraries, I faced with another problem. You can refer to below log.

According to the log messages, it seems to fail to load below dynamic shard library compiled by toolchain on the target,
/tmp/tmpY0HqPz/tmp_func_fba6f348b9e87e67.tar.so.

Is there any idea to resolve this problem?

Thanks,
Inki Dae

Logs

Tuning…
[Task 1/22] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/1) | 0.00 sINFO:autotvm:Get devices for measurement successfully!
DEBUG:autotvm:No: 1 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(RuntimeError(‘Except caught from RPC call: TVMCall CFunc Error:\nTraceback (most recent call last):\n File “/usr/lib/tvm/python/tvm/_ffi/_ctypes/function.py”, line 55, in cfun\n rv = local_pyfunc(*pyargs)\n File “/usr/lib/tvm/python/tvm/rpc/server.py”, line 50, in load_module\n m = _load_module(path)\n File “/usr/lib/tvm/python/tvm/module.py”, line 225, in load\n return _LoadFromFile(path, fmt)\n File “/usr/lib/tvm/python/tvm/_ffi/_ctypes/function.py”, line 185, in call\n ctypes.byref(ret_val), ctypes.byref(ret_tcode)))\n File “/usr/lib/tvm/python/tvm/_ffi/base.py”, line 66, in check_call\n raise TVMError(py_str(LIB.TVMGetLastError()))\nTVMError: [10:31:26] /home/abuild/rpmbuild/BUILD/tvm-0.5/src/runtime/dso_module.cc:93: Check failed: lib_handle != nullptr Failed to load dynamic shared library /tmp/tmpZqkuAN/tmp_func_1c542b2754fe4c37.tar.so /tmp/tmpZqkuAN/tmp_func_1c542b2754fe4c37.tar.so: failed to map segment from shared object\n\n’,),), error_no=4, all_cost=2.2115097045898438, timestamp=1542248011.8370514) [(‘tile_bna’, 8), (‘tile_bnb’, 4), (‘tile_t1’, [32, 2]), (‘tile_t2’, [32, 8]), (‘c_unroll’, [8, 8]), (‘yt’, 4)],winograd,None,15963
[Task 1/22] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (1/1) | 2.61 s Done.

Although I commented out below two lines it didn’t work and I faced with same problem when trying to do RPC test.
#print(“Tuning…”)
#tune_tasks(tasks, **tuning_opt)

However, the RPC test worked well after changing file path like below,
remote.upload(“net.tar”, target=’/opt/usr/net.tar’)
rlib = remote.load_module("/opt/usr/net.tar")

And its result,
Upload…
Evaluate inference time cost…
Mean inference time (std dev): 181.88 ms (1.28 ms)

So is there any way to change the file path where a created so module can be located when tune_tasks is called?

Thanks,
Inki Dae

It seems something wrong on my platform.

I modified module.py for the file path change(from /tmp to /opt), and it worked well,
Tuning…
[Task 1/22] Current/Best: 8.09/ 9.08 GFLOPS | Progress: (10/10) | 21.83 s Done.
[Task 2/22] Current/Best: 3.34/ 8.14 GFLOPS | Progress: (10/10) | 25.80 s Done.
[Task 3/22] Current/Best: 2.90/ 11.54 GFLOPS | Progress: (10/10) | 31.87 s Done.
[Task 4/22] Current/Best: 0.98/ 6.38 GFLOPS | Progress: (10/10) | 42.56 s Done.
[Task 5/22] Current/Best: 0.11/ 0.37 GFLOPS | Progress: (10/10) | 113.57 s Done.
[Task 6/22] Current/Best: 2.82/ 2.82 GFLOPS | Progress: (10/10) | 22.77 s Done.
[Task 7/22] Current/Best: 0.52/ 1.99 GFLOPS | Progress: (10/10) | 113.09 s Done.
[Task 8/22] Current/Best: 1.07/ 4.11 GFLOPS | Progress: (10/10) | 11.56 s Done.
[Task 9/22] Current/Best: 0.44/ 1.24 GFLOPS | Progress: (10/10) | 24.25 s Done.
[Task 10/22] Current/Best: 0.00/ 3.42 GFLOPS | Progress: (10/10) | 9.26 s Done.
[Task 11/22] Current/Best: 5.37/ 5.40 GFLOPS | Progress: (10/10) | 53.82 s Done.
[Task 12/22] Current/Best: 0.12/ 0.28 GFLOPS | Progress: (10/10) | 8.17 s Done.
[Task 13/22] Current/Best: 0.22/ 0.93 GFLOPS | Progress: (10/10) | 17.75 s Done.
[Task 14/22] Current/Best: 2.32/ 2.32 GFLOPS | Progress: (10/10) | 17.68 s Done.
[Task 15/22] Current/Best: 0.00/ 2.71 GFLOPS | Progress: (10/10) | 34.08 s Done.
[Task 16/22] Current/Best: 0.62/ 0.62 GFLOPS | Progress: (10/10) | 13.70 s Done.
[Task 17/22] Current/Best: 0.00/ 1.78 GFLOPS | Progress: (10/10) | 7.22 s Done.
[Task 18/22] Current/Best: 0.21/ 2.36 GFLOPS | Progress: (10/10) | 29.81 s Done.
[Task 19/22] Current/Best: 4.24/ 4.24 GFLOPS | Progress: (10/10) | 108.33 s Done.
[Task 20/22] Current/Best: 0.00/ 0.56 GFLOPS | Progress: (10/10) | 7.95 s Done.
[Task 21/22] Current/Best: 0.15/ 0.32 GFLOPS | Progress: (10/10) | 18.46 s Done.
[Task 22/22] Current/Best: 1.42/ 2.32 GFLOPS | Progress: (10/10) | 81.45 s Done.

Thanks,
Inki Dae

1 Like