I am trying to tune a single convolutional layer for NVIDIA GPU on Jetson TX2. I am referencing the tutorial in tvm/tutorials/autotvm/tune_nnvm_cuda.py.
The layer is defined in PyTorch:
model = nn.Sequential(nn.Conv2d(in_chns, out_chns, ksize, stride, pad, groups=1, bias=use_bias),)
and exported via ONNX, then imported into NNVM.
Cross-compiling on a host machine (x86_64-linux-gnu
) targeting CUDA on TX2 (target=tvm.target.cuda()
and target_host="llvm -target=aarch64-linux-gnu"
)
When I try to run the autotuner, I always see 0.00/ 0.00 GFLOPS
:
[Task 1/ 1] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (96/1000) | 87.82 s
I tried enabling the logger as suggested in the tutorial and got this output:
INFO:autotvm:Get devices for measurement successfully!
DEBUG:autotvm:No: 1 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(RuntimeError('Except caught from RPC call: TVMCall CFunc Error:\nTraceback (most recent call last):\n File "/home/nvidia/tvm/python/tvm/_ffi/_ctypes/function.py", line 54, in cfun\n try:\n File "/home/nvidia/tvm/python/tvm/rpc/server.py", line 50, in load_module\n m = _load_module(path)\n File "/home/nvidia/tvm/python/tvm/module.py", line 222, in load\n _cc.create_shared(path + ".so", files)\n File "/home/nvidia/tvm/python/tvm/contrib/cc.py", line 33, in create_shared\n _linux_shared(output, objects, options, cc)\n File "/home/nvidia/tvm/python/tvm/contrib/cc.py", line 58, in _linux_shared\n raise RuntimeError(msg)\nRuntimeError: Compilation error:\n/usr/bin/ld: /tmp/tmphj1i03/lib.o: Relocations in generic ELF (EM: 62)\n/usr/bin/ld: /tmp/tmphj1i03/lib.o: Relocations in generic ELF (EM: 62)\n/tmp/tmphj1i03/lib.o: error adding symbols: File in wrong format\ncollect2: error: ld returned 1 exit status\n\n',),), error_no=4, all_cost=2.0872275829315186, timestamp=1536378632.1542888) [('tile_f', [1, 1, 1, 1]), ('tile_y', [2, 56, 2, 1]), ('tile_x', [14, 1, 8, 2]), ('tile_rc', [1, 3]), ('tile_ry', [5, 1]), ('tile_rx', [1, 5]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)],direct,None,271508
DEBUG:autotvm:No: 2 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(RuntimeError('Except caught from RPC call: TVMCall CFunc Error:\nTraceback (most recent call last):\n File "/home/nvidia/tvm/python/tvm/_ffi/_ctypes/function.py", line 54, in cfun\n try:\n File "/home/nvidia/tvm/python/tvm/rpc/server.py", line 50, in load_module\n m = _load_module(path)\n File "/home/nvidia/tvm/python/tvm/module.py", line 222, in load\n _cc.create_shared(path + ".so", files)\n File "/home/nvidia/tvm/python/tvm/contrib/cc.py", line 33, in create_shared\n _linux_shared(output, objects, options, cc)\n File "/home/nvidia/tvm/python/tvm/contrib/cc.py", line 58, in _linux_shared\n raise RuntimeError(msg)\nRuntimeError: Compilation error:\n/usr/bin/ld: /tmp/tmpZ3S7Sa/lib.o: Relocations in generic ELF (EM: 62)\n/usr/bin/ld: /tmp/tmpZ3S7Sa/lib.o: Relocations in generic ELF (EM: 62)\n/tmp/tmpZ3S7Sa/lib.o: error adding symbols: File in wrong format\ncollect2: error: ld returned 1 exit status\n\n',),), error_no=4, all_cost=3.4724888801574707, timestamp=1536378631.6049912) [('tile_f', [1, 1, 1, 1]), ('tile_y', [112, 1, 2, 1]), ('tile_x', [8, 14, 1, 2]), ('tile_rc', [1, 3]), ('tile_ry', [5, 1]), ('tile_rx', [1, 5]), ('auto_unroll_max_step', 512), ('unroll_explicit', 0)],direct,None,667532
DEBUG:autotvm:No: 3 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(RuntimeError('Except caught from RPC call: TVMCall CFunc Error:\nTraceback (most recent call last):\n File "/home/nvidia/tvm/python/tvm/_ffi/_ctypes/function.py", line 54, in cfun\n try:\n File "/home/nvidia/tvm/python/tvm/rpc/server.py", line 50, in load_module\n m = _load_module(path)\n File "/home/nvidia/tvm/python/tvm/module.py", line 222, in load\n _cc.create_shared(path + ".so", files)\n File "/home/nvidia/tvm/python/tvm/contrib/cc.py", line 33, in create_shared\n _linux_shared(output, objects, options, cc)\n File "/home/nvidia/tvm/python/tvm/contrib/cc.py", line 58, in _linux_shared\n raise RuntimeError(msg)\nRuntimeError: Compilation error:\n/usr/bin/ld: /tmp/tmpVrOrAD/lib.o: Relocations in generic ELF (EM: 62)\n/usr/bin/ld: /tmp/tmpVrOrAD/lib.o: Relocations in generic ELF (EM: 62)\n/tmp/tmpVrOrAD/lib.o: error adding symbols: File in wrong format\ncollect2: error: ld returned 1 exit status\n\n',),), error_no=4, all_cost=2.699096441268921, timestamp=1536378632.7313952) [('tile_f', [1, 1, 1, 1]), ('tile_y', [1, 16, 2, 7]), ('tile_x', [112, 1, 1, 2]), ('tile_rc', [3, 1]), ('tile_ry', [5, 1]), ('tile_rx', [1, 5]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)],direct,None,214964
One of the responses in this thread
encountered EM: 62
errors as well, so I made sure to set the target and target host correctly. However, the problem still persists.
Does anyone have suggestions on what could be causing this?