Auto-tuning a convolutional network for Mobile GPU doesn't work

daeinki · November 14, 2018, 8:56am

Hello,

I’m trying to auto-tuning a pre-trained model - squeezenet_v1.1 but faced with same problem even other models(resnet, inceptionv3) - according to below guide,
https://docs.tvm.ai/tutorials/autotvm/tune_nnvm_mobile_gpu.html#sphx-glr-tutorials-autotvm-tune-nnvm-mobile-gpu-py

However, I see below many logs when I typed the command, “python3 tune_nnvm_mobile_gpu.py”
And it seems something wrong. Could you point me what is problem?

Thanks,
Inki Dae

Logs

daeinki@daeinki-linux:~/project/public/tvm_test$ python3 tune_nnvm_mobile_gpu.py
Extract tasks…
/home/daeinki/.local/lib/python3.5/site-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/tag.py:32: UserWarning: Tag ‘broadcast’ declared via TagScope was not used.
warnings.warn(“Tag ‘%s’ declared via TagScope was not used.” % (self.tag,))
DEBUG:root:lower function fuse_conv2d_relu
DEBUG:root:// attr [pad_temp] storage_scope = “global”
allocate pad_temp[float32 * 1 * 3 * 225 * 225]
// attr [compute] storage_scope = “global”
allocate compute[float32 * 802816]
// attr [tensor] storage_scope = “global”
allocate tensor[float32 * 64 * 1 * 1]
…
produce tensor {
for (ax1, 0, 1000) {
tensor[ax1] = (exp((input0[ax1] - tensor[0]))/tensor[0])
}
}

Tuning…
…
[Task 20/22] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (1/1) | 0.16 s Done.
DEBUG:autotvm:XGB load 16 entries from history log file
[Task 21/22] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/1) | 0.00 sINFO:autotvm:Get devices for measurement successfully!
DEBUG:autotvm:No: 1 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(RuntimeError(‘Except caught from RPC call: TVMCall CFunc Error:\nTraceback (most recent call last):\n File “/usr/lib/tvm/python/tvm/_ffi/_ctypes/function.py”, line 55, in cfun\n rv = local_pyfunc(*pyargs)\n File “/usr/lib/tvm/python/tvm/rpc/server.py”, line 50, in load_module\n m = _load_module(path)\n File “/usr/lib/tvm/python/tvm/module.py”, line 222, in load\n _cc.create_shared(path + “.so”, files)\n File “/usr/lib/tvm/python/tvm/contrib/cc.py”, line 33, in create_shared\n _linux_shared(output, objects, options, cc)\n File “/usr/lib/tvm/python/tvm/contrib/cc.py”, line 53, in _linux_shared\n cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)\n File “/usr/lib/python2.7/subprocess.py”, line 710, in init\n errread, errwrite)\n File “/usr/lib/python2.7/subprocess.py”, line 1327, in _execute_child\n raise child_exception\nOSError: [Errno 2] No such file or directory\n’,),), error_no=4, all_cost=1.1189663410186768, timestamp=1542185572.3326507) [(‘tile_co’, [4, 4]), (‘tile_oh’, [1, 55]), (‘tile_ow’, [11, 5]), (‘reorder_0’, [0, 1, 2, 3, 4, 5, 6, 9, 7, 8]), (‘ann_reduce’, [‘none’, ‘none’]), (‘ann_spatial’, [‘vec’, ‘none’, ‘unroll’])],direct,None,10997
[Task 21/22] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (1/1) | 1.34 s Done.
DEBUG:autotvm:XGB load 17 entries from history log file
[Task 22/22] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/1) | 0.00 sINFO:autotvm:Get devices for measurement successfully!
DEBUG:autotvm:No: 1 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(RuntimeError(‘Except caught from RPC call: TVMCall CFunc Error:\nTraceback (most recent call last):\n File “/usr/lib/tvm/python/tvm/_ffi/_ctypes/function.py”, line 55, in cfun\n rv = local_pyfunc(*pyargs)\n File “/usr/lib/tvm/python/tvm/rpc/server.py”, line 50, in load_module\n m = _load_module(path)\n File “/usr/lib/tvm/python/tvm/module.py”, line 222, in load\n _cc.create_shared(path + “.so”, files)\n File “/usr/lib/tvm/python/tvm/contrib/cc.py”, line 33, in create_shared\n _linux_shared(output, objects, options, cc)\n File “/usr/lib/tvm/python/tvm/contrib/cc.py”, line 53, in _linux_shared\n cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)\n File “/usr/lib/python2.7/subprocess.py”, line 710, in init\n errread, errwrite)\n File “/usr/lib/python2.7/subprocess.py”, line 1327, in _execute_child\n raise child_exception\nOSError: [Errno 2] No such file or directory\n’,),), error_no=4, all_cost=6.960962295532227, timestamp=1542185579.8855002) [(‘tile_co’, [32, 2]), (‘tile_oh’, [7, 16]), (‘tile_ow’, [7, 16]), (‘reorder_0’, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), (‘ann_reduce’, [‘unroll’, ‘unroll’]), (‘ann_spatial’, [‘unroll’, ‘vec’, ‘none’])],direct,None,83063
[Task 22/22] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (1/1) | 7.20 s Done.
DEBUG:autotvm:Finish loading 22 records
INFO:autotvm:Extract 0 best records from the odroid.squeezenet_v1.1.log.tmp
DEBUG:autotvm:Finish loading 0 records

yzhliu · November 14, 2018, 9:58pm

Could you try skipping tuning, only run evaluation, does it work? Just to make sure whether it is a problem of tuning itself, or compiling / rpc setup problem.

basically, comment out these lines,

# run tuning tasks
print("Tuning...")
tune_tasks(tasks, **tuning_opt)

# compile kernels with history best records
with autotvm.apply_history_best(log_file):

daeinki · November 15, 2018, 2:25am

Thanks for reply.

I looked into this problem and found out that toolchain wasn’t installed on the target device. However, after installing toolchain and relevant libraries, I faced with another problem. You can refer to below log.

According to the log messages, it seems to fail to load below dynamic shard library compiled by toolchain on the target,
/tmp/tmpY0HqPz/tmp_func_fba6f348b9e87e67.tar.so.

Is there any idea to resolve this problem?

Thanks,
Inki Dae

Logs

Tuning…
[Task 1/22] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/1) | 0.00 sINFO:autotvm:Get devices for measurement successfully!
DEBUG:autotvm:No: 1 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(RuntimeError(‘Except caught from RPC call: TVMCall CFunc Error:\nTraceback (most recent call last):\n File “/usr/lib/tvm/python/tvm/_ffi/_ctypes/function.py”, line 55, in cfun\n rv = local_pyfunc(*pyargs)\n File “/usr/lib/tvm/python/tvm/rpc/server.py”, line 50, in load_module\n m = _load_module(path)\n File “/usr/lib/tvm/python/tvm/module.py”, line 225, in load\n return _LoadFromFile(path, fmt)\n File “/usr/lib/tvm/python/tvm/_ffi/_ctypes/function.py”, line 185, in call\n ctypes.byref(ret_val), ctypes.byref(ret_tcode)))\n File “/usr/lib/tvm/python/tvm/_ffi/base.py”, line 66, in check_call\n raise TVMError(py_str(LIB.TVMGetLastError()))\nTVMError: [10:31:26] /home/abuild/rpmbuild/BUILD/tvm-0.5/src/runtime/dso_module.cc:93: Check failed: lib_handle != nullptr Failed to load dynamic shared library /tmp/tmpZqkuAN/tmp_func_1c542b2754fe4c37.tar.so /tmp/tmpZqkuAN/tmp_func_1c542b2754fe4c37.tar.so: failed to map segment from shared object\n\n’,),), error_no=4, all_cost=2.2115097045898438, timestamp=1542248011.8370514) [(‘tile_bna’, 8), (‘tile_bnb’, 4), (‘tile_t1’, [32, 2]), (‘tile_t2’, [32, 8]), (‘c_unroll’, [8, 8]), (‘yt’, 4)],winograd,None,15963
[Task 1/22] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (1/1) | 2.61 s Done.

daeinki · November 15, 2018, 4:51am

Although I commented out below two lines it didn’t work and I faced with same problem when trying to do RPC test.
#print(“Tuning…”)
#tune_tasks(tasks, **tuning_opt)

However, the RPC test worked well after changing file path like below,
remote.upload(“net.tar”, target=’/opt/usr/net.tar’)
rlib = remote.load_module("/opt/usr/net.tar")

And its result,
Upload…
Evaluate inference time cost…
Mean inference time (std dev): 181.88 ms (1.28 ms)

So is there any way to change the file path where a created so module can be located when tune_tasks is called?

Thanks,
Inki Dae

daeinki · November 15, 2018, 6:44am

It seems something wrong on my platform.

Thanks,
Inki Dae