I also try it on the tvm cuda version, the tvm is based on tvmai/ci-gpu:v0.64 image.
tvm dev0.71
It can out the error:
[Task 53/55] Current/Best: 7418.46/8419.66 GFLOPS | Progress: (960/2000) | 2493.48 s Done.
[Task 54/55] Current/Best: 6438.97/17485.65 GFLOPS | Progress: (896/2000) | 2699.27 s Done.
[Task 55/55] Current/Best: 1054.10/5853.12 GFLOPS | Progress: (1984/2000) | 5311.74 s Done.
Compile...
[02:50:51] /root/tvm/src/te/schedule/bound.cc:119: not in feed graph consumer = extern(argsort_gpu, 0x698c6a0)
Evaluate inference time cost...
Traceback (most recent call last):
File "tune_relay_cuda_ssd.py", line 250, in <module>
tune_and_evaluate(tuning_option)
File "tune_relay_cuda_ssd.py", line 243,
in tune_and_evaluate prof_res = np.array(ftimer().results) * 1000 # convert to millisecond
File "/root/tvm/python/tvm/runtime/module.py", line 215,
in evaluator blob = feval(*args)
File "/root/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 225,
in __call__ raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
[bt] (6) /root/tvm/build/libtvm.so(TVMFuncCall+0x61) [0x7f81d7d14841]
[bt] (5) /root/tvm/build/libtvm.so(+0xea3d67) [0x7f81d7d59d67]
[bt] (4) /root/tvm/build/libtvm.so(+0xea38ca) [0x7f81d7d598ca]
[bt] (3) /root/tvm/build/libtvm.so(tvm::runtime::GraphRuntime::Run()+0x47) [0x7f81d7d6e0e7]
[bt] (2) /root/tvm/build/libtvm.so(+0xeb8057) [0x7f81d7d6e057]
[bt] (1) /root/tvm/build/libtvm.so(+0xe79d66) [0x7f81d7d2fd66]
[bt] (0) /root/tvm/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x67)
[0x7f81d731c367]
File "/root/tvm/src/runtime/library_module.cc", line 78
TVMError: Check failed: ret == 0 (-1 vs. 0) : Assert fail: (num_args == 4), fused_nn_softmax_2:
num_args should be 4