- I am trying to exec on rk3399-opencl/mali using float16 yolov3-tiny, but getting this error:
File yolov3-tiny.cfg exists, skip.
File yolov3-tiny.weights exists, skip.
layer filters size input output
0 conv 16 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 16 0.150 BFLOPs
1 max 2 x 2 / 2 416 x 416 x 16 -> 208 x 208 x 16
2 conv 32 3 x 3 / 1 208 x 208 x 16 -> 208 x 208 x 32 0.399 BFLOPs
3 max 2 x 2 / 2 208 x 208 x 32 -> 104 x 104 x 32
4 conv 64 3 x 3 / 1 104 x 104 x 32 -> 104 x 104 x 64 0.399 BFLOPs
5 max 2 x 2 / 2 104 x 104 x 64 -> 52 x 52 x 64
6 conv 128 3 x 3 / 1 52 x 52 x 64 -> 52 x 52 x 128 0.399 BFLOPs
7 max 2 x 2 / 2 52 x 52 x 128 -> 26 x 26 x 128
8 conv 256 3 x 3 / 1 26 x 26 x 128 -> 26 x 26 x 256 0.399 BFLOPs
9 max 2 x 2 / 2 26 x 26 x 256 -> 13 x 13 x 256
10 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512 0.399 BFLOPs
11 max 2 x 2 / 1 13 x 13 x 512 -> 13 x 13 x 512
12 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs
13 conv 256 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 256 0.089 BFLOPs
14 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512 0.399 BFLOPs
15 conv 255 1 x 1 / 1 13 x 13 x 512 -> 13 x 13 x 255 0.044 BFLOPs
16 yolo
17 route 13
18 conv 128 1 x 1 / 1 13 x 13 x 256 -> 13 x 13 x 128 0.011 BFLOPs
19 upsample 2x 13 x 13 x 128 -> 26 x 26 x 128
20 route 19 8
21 conv 256 3 x 3 / 1 26 x 26 x 384 -> 26 x 26 x 256 1.196 BFLOPs
22 conv 255 1 x 1 / 1 26 x 26 x 256 -> 26 x 26 x 255 0.088 BFLOPs
23 yolo
Loading weights from yolov3-tiny.weights...Done!
Converting darknet to nnvm symbols...
Input shape [{'data': (1, 3, 416, 416)}]
Extract tasks...
Compile with [rk3399.yolov3.log]...
Export...
Upload Net...
Upload Params...
Evaluate inference time cost...
Traceback (most recent call last):
File "./tune_yolov3.py", line 283, in <module>
prof_res = np.array(ftimer().results) * 1000 # convert to millisecond
File "/usr/lib64/python3.7/site-packages/tvm/module.py", line 194, in evaluator
blob = feval(*args)
File "tvm/_ffi/_cython/./function.pxi", line 304, in tvm._ffi._cy3.core.FunctionBase.__call__
File "tvm/_ffi/_cython/./function.pxi", line 239, in tvm._ffi._cy3.core.FuncCall
File "tvm/_ffi/_cython/./function.pxi", line 228, in tvm._ffi._cy3.core.FuncCall3
File "tvm/_ffi/_cython/./base.pxi", line 168, in tvm._ffi._cy3.core.CALL
tvm._ffi.base.TVMError: Traceback (most recent call last):
[bt] (8) /usr/lib64/libtvm.so(+0x943bd4) [0x7fa2aeebd4]
[bt] (7) /usr/lib64/libtvm.so(+0x94af14) [0x7fa2af5f14]
[bt] (6) /usr/lib64/libtvm.so(+0x9494d4) [0x7fa2af44d4]
[bt] (5) /usr/lib64/libtvm.so(+0x942d3c) [0x7fa2aedd3c]
[bt] (4) /usr/lib64/libtvm.so(+0x941d24) [0x7fa2aecd24]
[bt] (3) /usr/lib64/libtvm.so(+0x9419ac) [0x7fa2aec9ac]
[bt] (2) /usr/lib64/libtvm.so(+0x94fbc4) [0x7fa2afabc4]
[bt] (1) /usr/lib64/libtvm.so(+0x950c40) [0x7fa2afbc40]
[bt] (0) /usr/lib64/libtvm.so(+0x91f058) [0x7fa2aca058]
^
tensor[((((ax0_ax1_fused_ax2_fused_ax3_fused_outer * 256) + ((int)get_group_id(0))) * 256) + ((int)get_local_id(0)))] = (half)((((((((ax0_ax1_fused_ax2_fused_ax3_fused_outer * 256) + ((int)get_group_id(0))) * 256) + ((int)get_local_id(0))) % 196) < 182) && ((((((ax0_ax1_fused_ax2_fused_ax3_fused_outer * 256) + ((int)get_group_id(0))) * 256) + ((int)get_local_id(0))) % 14) < 13)) ? input0[(((((((((ax0_ax1_fused_ax2_fused_ax3_fused_outer * 256) + ((int)get_group_id(0))) * 256) + ((int)get_local_id(0))) / 196) * 13) + ((((((ax0_ax1_fused_ax2_fused_ax3_fused_outer * 256) + ((int)get_group_id(0))) * 256) + ((int)get_local_id(0))) % 196) / 14)) * 13) + (((((ax0_ax1_fused_ax2_fused_ax3_fused_outer * 256) + ((int)get_group_id(0))) * 256) + ((int)get_local_id(0))) % 14))] : (half)-inff);
File "/builddir/build/BUILD/tvm/src/runtime/opencl/opencl_module.cc", line 235
TVMError: Except caught from RPC call: [02:32:24] /builddir/build/BUILD/tvm/src/runtime/module_util.cc:73: Check failed: ret == 0 (-1 vs. 0) : TVMError: OpenCL build error for device=0x7f9cb107f0<source>:730:790: error: use of undeclared identifier 'inff'
error: Compiler frontend failed (error code 59)
[cbalint@yoda yolo-tiny-mali-float16]$ mcedit tune_yolov3.py
-
Let me know if need any more information, but so far a summary:
- targeting arm-cpu works fine (float16/float32 too)
- targeting opencl-mali don’t work (float16/float32, booth)
- i was able to autotune all conv operators targeting opencl-mali-float16
-
Not sure from where inff (float infinite ?) is coming, make no sense.
-
If necessary will open an Issue with sample, but lets see opinion first.
I am using tvm master @17Apr2019 & LLVM8.0, same is with LLVM6.0 .
Same is reproducing here : https://github.com/dmlc/tvm/issues/3023