Compile Error when running tvm on ubuntu 18.04 4.15 kernel

Laurawly · January 15, 2019, 1:55am

Code:
func = tvm.build(s, [A, W, B], target, target_host=target_host)

    # save compiled module
    temp = util.tempdir()
    path_lib = temp.relpath("deploy_lib.so")
    func.export_library(path_lib)

Error:
Traceback (most recent call last):
File “layerize_test_new.py”, line 327, in
verify_workloads(tvm.cl(), 1, tvm.target.intel_graphics(), target_host)
File “layerize_test_new.py”, line 309, in verify_workloads
target_host=target_host, remote=remote)
File “layerize_test_new.py”, line 160, in verify_conv2d_nchw
func.export_library(path_lib)
File “/home/aws_cam/workplace/tvm/python/tvm/module.py”, line 128, in export_library
fcompile(file_name, files, **kwargs)
File “/home/aws_cam/workplace/tvm/python/tvm/contrib/cc.py”, line 33, in create_shared
_linux_shared(output, objects, options, cc)
File “/home/aws_cam/workplace/tvm/python/tvm/contrib/cc.py”, line 60, in _linux_shared
raise RuntimeError(msg)
RuntimeError: Compilation error:
/usr/bin/ld: /tmp/tmpuc9l75ul/lib.o: relocation R_X86_64_32S against `.bss’ can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: error: ld returned 1 exit status

Question: where is lib.o built from? Trying to add -fPIC to the compilation

srkreddy1238 · January 15, 2019, 5:12am

Source for lib.cc come from

github.com

dmlc/tvm/blob/3bbfc2dcba8646c9b795d8159ee3381085c4779d/src/codegen/codegen.cc#L33


}
std::string build_f_name = "codegen.build_" + mode;
// the build function.
const PackedFunc* bf = runtime::Registry::Get(build_f_name);
CHECK(bf != nullptr)
    << "Target " << target << " is not enabled";
runtime::Module m = (*bf)(funcs, target);
return m;
}


std::string PackImportsToC(const runtime::Module& mod, bool system_lib) {
std::string bin;
dmlc::MemoryStringStream ms(&bin);
dmlc::Stream* stream = &ms;
uint64_t sz = static_cast<uint64_t>(mod->imports().size());
stream->Write(sz);
for (runtime::Module im : mod->imports()) {
  CHECK_EQ(im->imports().size(), 0U)
      << "Only support simply one-level hierarchy";
  std::string tkey = im->type_key();
  stream->Write(tkey);

and the compilation happen @

github.com

dmlc/tvm/blob/6ab05082ceebf1fb7dd775ad3c09ef872aab3a1d/python/tvm/contrib/cc.py#L40


    The compile string.
"""
if sys.platform == "darwin" or sys.platform.startswith('linux'):
    _linux_shared(output, objects, options, cc)
elif sys.platform == "win32":
    _windows_shared(output, objects, options)
else:
    raise ValueError("Unsupported platform")




def _linux_shared(output, objects, options, cc="g++"):
cmd = [cc]
cmd += ["-shared", "-fPIC"]
if sys.platform == "darwin":
    cmd += ["-undefined", "dynamic_lookup"]
cmd += ["-o", output]
if isinstance(objects, str):
    cmd += [objects]
else:
    cmd += objects
if options:

I think -fPIC is already there.

Laurawly · January 15, 2019, 5:21am

Hi @srkreddy1238 Thanks for your reply. I printed out cmd from dmlc/tvm/blob/6ab05082ceebf1fb7dd775ad3c09ef872aab3a1d/python/tvm/contrib/cc.py#L40. But it seems that the compilation is for deploy_lib.so (final exported lib) instead of lib.o which is an intermediate result.

Laurawly · January 15, 2019, 5:23am

This post seems to have the same problem as RuntimeError: relocation R_X86_64_32S against `.bss' can not be used when making a shared object in _linux_shared. We share the same linux kernel version which is 4.15.0-43-generic

srkreddy1238 · January 15, 2019, 5:50am

yep , I see it now.

How about LLVM and GCC versions ?

Laurawly · January 15, 2019, 6:47pm

LLVM is 7.0.1 which is the latest version that supports ubuntu 18.04. And gcc is 7.3.0

srkreddy1238 · January 16, 2019, 5:32am

surprising, I tried with same config and it work fine.

Laurawly · January 16, 2019, 7:07pm

I tested llvm and it worked. I’m testing on deeplens and it uses opencl --device=intel_graphics which fails.

Laurawly · January 16, 2019, 9:30pm

@srkreddy1238 What target did you try on?

srkreddy1238 · January 17, 2019, 5:18am

I checked on LLVM
I will try later on intel_graphics and let you know if any luck.

jackwish · January 19, 2019, 5:59am

Try gcc 4.9? I once encountered building issue on Ubuntu 18.04 with default version gcc.

Laurawly · January 28, 2019, 10:41pm

Just an update on this. TVM has a bug when running GPU tests on Ubuntu 18.04 compiled with LLVM 7.0+. Reproduced on AWS EC2 p2 instance. @tqchen

yzhliu · January 28, 2019, 11:22pm

CPU target is not good either: Compilation itself is good, model runs, exporting library fails. Downgrading LLVM works.

tqchen · January 29, 2019, 3:41am

would be great if we can look into the compact issue with the latest LLVM mainline

zhiics · January 29, 2019, 5:30am

Yeah, I think I had the same issue on Ubuntu 16.04 with Linux 4.4 kernel and LLVM7.0 by running tests/python/unittest/test_runtime_graph.py, but it looks that LLVM6.0 works.

FrozenGene · January 29, 2019, 4:11pm

One quick way to workaround: Use Clang and Clang++ to build TVM project. You could use

cmake .. -DCMAKE_CXX_COMPILER="clang++" -DCMAKE_C_COMPILER="clang" ,

Then we can avoid this issue.

In fact, this is not our issue. It is the ABI incompatibility issue between Clang and GCC when to handle llvm optional data structure due to the trivially copyable optimization in the OptionalStorage type
being enabled when compiling with clang and disabled when GCC.

In short, if the LLVM is compiled with Clang, the project links LLVM library should use Clang too. If the LLVM is compiled with GCC, the project links LLVM library should use GCC too.

Unfortunately, The compiler of prebuilt packages of LLVM seems is Clang, not GCC. The LLVM community also aware it: https://lists.llvm.org/pipermail/llvm-dev/2018-October/126603.html

And this bug: https://bugs.llvm.org/show_bug.cgi?id=39427 confirm this issue too and one patch for fixing is merged: https://reviews.llvm.org/D54540. And LLVM 7.1.0 sould contain this patch.

For our TVM of Release 0.5, we should release one note for LLVM 7.0 users

If you want to use GCC to build TVM, please use GCC to compile LLVM 7.0 by yourself
If you want to use prebuilt packages of LLVM 7.0 on Ubuntu, please use Clang to build TVM.
Don’t use LLVM 7.0 until LLVM fix it.

Moreever, we have similar things we should notice: LLVM ERROR: Only small and large code models are allowed on AArch64

It is the same reason.

I suggest we don’t do anything for ugly workaround and wait LLVM 7.1.0.

@tqchen @Laurawly @yzhliu @zhiics

Laurawly · January 29, 2019, 9:54pm

@FrozenGene In this way, it does work for me on the LLVM 7.0+ issue. But for Opencl backend, I still have the following error:

: CommandLine Error: Option ‘disable-symbolication’ registered more than once!
LLVM ERROR: inconsistency in registered CommandLine options

FrozenGene · January 30, 2019, 1:37am

OpenCL is another issue, see https://github.com/intel/compute-runtime/issues/122

FrozenGene · January 30, 2019, 1:55am

@Laurawly One workaround I can come up with is using RPC. One machine has only TVM runtime, another is build with OpenCL. Then we use RPC to tune and run.

eqy · January 30, 2019, 2:15am

Yes, we have used something similar in the past in the early days of AutoTVM