Compile Error when running tvm on ubuntu 18.04 4.15 kernel

Code:
func = tvm.build(s, [A, W, B], target, target_host=target_host)

    # save compiled module
    temp = util.tempdir()
    path_lib = temp.relpath("deploy_lib.so")
    func.export_library(path_lib)

Error:
Traceback (most recent call last):
File “layerize_test_new.py”, line 327, in
verify_workloads(tvm.cl(), 1, tvm.target.intel_graphics(), target_host)
File “layerize_test_new.py”, line 309, in verify_workloads
target_host=target_host, remote=remote)
File “layerize_test_new.py”, line 160, in verify_conv2d_nchw
func.export_library(path_lib)
File “/home/aws_cam/workplace/tvm/python/tvm/module.py”, line 128, in export_library
fcompile(file_name, files, **kwargs)
File “/home/aws_cam/workplace/tvm/python/tvm/contrib/cc.py”, line 33, in create_shared
_linux_shared(output, objects, options, cc)
File “/home/aws_cam/workplace/tvm/python/tvm/contrib/cc.py”, line 60, in _linux_shared
raise RuntimeError(msg)
RuntimeError: Compilation error:
/usr/bin/ld: /tmp/tmpuc9l75ul/lib.o: relocation R_X86_64_32S against `.bss’ can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: error: ld returned 1 exit status

Question: where is lib.o built from? Trying to add -fPIC to the compilation

Source for lib.cc come from

and the compilation happen @

I think -fPIC is already there.

Hi @srkreddy1238 Thanks for your reply. I printed out cmd from dmlc/tvm/blob/6ab05082ceebf1fb7dd775ad3c09ef872aab3a1d/python/tvm/contrib/cc.py#L40. But it seems that the compilation is for deploy_lib.so (final exported lib) instead of lib.o which is an intermediate result.

This post seems to have the same problem as RuntimeError: relocation R_X86_64_32S against `.bss' can not be used when making a shared object in _linux_shared. We share the same linux kernel version which is 4.15.0-43-generic

yep , I see it now.

How about LLVM and GCC versions ?

LLVM is 7.0.1 which is the latest version that supports ubuntu 18.04. And gcc is 7.3.0

surprising, I tried with same config and it work fine.

I tested llvm and it worked. I’m testing on deeplens and it uses opencl --device=intel_graphics which fails.

@srkreddy1238 What target did you try on?

I checked on LLVM :slight_smile:
I will try later on intel_graphics and let you know if any luck.

1 Like

Try gcc 4.9? I once encountered building issue on Ubuntu 18.04 with default version gcc.

Just an update on this. TVM has a bug when running GPU tests on Ubuntu 18.04 compiled with LLVM 7.0+. Reproduced on AWS EC2 p2 instance. @tqchen

CPU target is not good either: Compilation itself is good, model runs, exporting library fails. Downgrading LLVM works.

would be great if we can look into the compact issue with the latest LLVM mainline

Yeah, I think I had the same issue on Ubuntu 16.04 with Linux 4.4 kernel and LLVM7.0 by running tests/python/unittest/test_runtime_graph.py, but it looks that LLVM6.0 works.

One quick way to workaround: Use Clang and Clang++ to build TVM project. You could use

cmake .. -DCMAKE_CXX_COMPILER="clang++" -DCMAKE_C_COMPILER="clang" ,

Then we can avoid this issue.

In fact, this is not our issue. It is the ABI incompatibility issue between Clang and GCC when to handle llvm optional data structure due to the trivially copyable optimization in the OptionalStorage type
being enabled when compiling with clang and disabled when GCC.

In short, if the LLVM is compiled with Clang, the project links LLVM library should use Clang too. If the LLVM is compiled with GCC, the project links LLVM library should use GCC too.

Unfortunately, The compiler of prebuilt packages of LLVM seems is Clang, not GCC. The LLVM community also aware it: https://lists.llvm.org/pipermail/llvm-dev/2018-October/126603.html

And this bug: https://bugs.llvm.org/show_bug.cgi?id=39427 confirm this issue too and one patch for fixing is merged: https://reviews.llvm.org/D54540. And LLVM 7.1.0 sould contain this patch.

For our TVM of Release 0.5, we should release one note for LLVM 7.0 users

  • If you want to use GCC to build TVM, please use GCC to compile LLVM 7.0 by yourself

  • If you want to use prebuilt packages of LLVM 7.0 on Ubuntu, please use Clang to build TVM.

  • Don’t use LLVM 7.0 until LLVM fix it.

Moreever, we have similar things we should notice: LLVM ERROR: Only small and large code models are allowed on AArch64


It is the same reason.

I suggest we don’t do anything for ugly workaround and wait LLVM 7.1.0.

@tqchen @Laurawly @yzhliu @zhiics

3 Likes

@FrozenGene In this way, it does work for me on the LLVM 7.0+ issue. But for Opencl backend, I still have the following error:

: CommandLine Error: Option ‘disable-symbolication’ registered more than once!
LLVM ERROR: inconsistency in registered CommandLine options

OpenCL is another issue, see https://github.com/intel/compute-runtime/issues/122

@Laurawly One workaround I can come up with is using RPC. One machine has only TVM runtime, another is build with OpenCL. Then we use RPC to tune and run.

2 Likes

Yes, we have used something similar in the past in the early days of AutoTVM :wink: