[Solved] The different results with the same config


#1

I compiled TVM on two machines. The first machine (with 1080Ti) works well, but the second machine (with TitanX) can not work with the same config. Does anyone know the reason? The following is the script and the log. (BTW, the second machine did not have enough device space to compile LLVM, and I downloaded binary files from the website. And then I used cmake without any errors, but when I run the scripts, it can not work well)

Script:

    python tune_nnvm_cuda.py

DEBUG Log:

    Extract tasks...
    Tuning...
    [Task  1/12]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (160/2000) | 115.42 sWARNING:autotvm:Too many errors happen in the tuning. Now is in debug mode
    DEBUG:autotvm:No: 161	GFLOPS: 0.00/0.00	result: MeasureResult(costs=(InstantiationError('Skipped because of invalid gpu kernel',),), error_no=1, all_cost=0.20407891273498535, timestamp=1543484151.439227)	[('tile_b', [16, 1, 1, 1]), ('tile_y', [4, 1, 1, 128]), ('tile_x', [1, 1, 2, 8]), ('tile_rc', [2, 256]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],winograd,None,454070
    DEBUG:autotvm:No: 162	GFLOPS: 0.00/0.00	result: MeasureResult(costs=(RuntimeError('Except caught from RPC call: TVMCall CFunc Error:\nTraceback (most recent call last):\n  File "/home/zhangxiaoyang/tvm/python/tvm/_ffi/_ctypes/function.py", line 55, in cfun\n    rv = local_pyfunc(*pyargs)\n  File "/home/zhangxiaoyang/tvm/python/tvm/rpc/server.py", line 50, in load_module\n    m = _load_module(path)\n  File "/home/zhangxiaoyang/tvm/python/tvm/module.py", line 226, in load\n    _cc.create_shared(path + ".so", files)\n  File "/home/zhangxiaoyang/tvm/python/tvm/contrib/cc.py", line 34, in create_shared\n    _linux_shared(output, objects, options, cc)\n  File "/home/zhangxiaoyang/tvm/python/tvm/contrib/cc.py", line 60, in _linux_shared\n    raise RuntimeError(msg)\nRuntimeError: Compilation error:\n/usr/bin/ld: /tmp/tmpcCxlnE/lib.o: relocation R_X86_64_32S against `.rodata.cst16\' can not be used when making a shared object; recompile with -fPIC\n/tmp/tmpcCxlnE/lib.o: error adding symbols: Bad value\ncollect2: error: ld returned 1 exit status\n\n',),), error_no=4, all_cost=1.0360040664672852, timestamp=1543484163.120505)	[('tile_b', [16, 1, 1, 1]), ('tile_y', [16, 1, 16, 2]), ('tile_x', [1, 1, 2, 8]), ('tile_rc', [8, 64]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)],winograd,None,53545
    …………

#2

Oh, I have solved this problem. It is caused by LLVM. The Pre-Built LLVM 7.0 Binaries can not work well with TVM, we have two solutions:
1、Manually compile LLVM 7.0
2、Download another Pre-Built Binaries like LLVM 6.0
Then the TVM works well.