[SOLVED] How to build and run nnvm model in a libaray independently instead of RPC


I built resnet model from by mxnet with nnvm but the generated library is not loadable in remote device(raspberry pi) even though I set the arm cpu target when I build the lib.
when the actual loading function tvm.module.load({path to lib]), It dumped the following error.
my host os is macos hight sierra 13 (just in case).

python3 run.py
Traceback (most recent call last):
  File "run.py", line 26, in <module>
    loaded_lib = tvm.module.load(cfg.lib_path)
  File "/home/pi/repos/tvm/python/tvm/module.py", line 225, in load
    return _LoadFromFile(path, fmt)
  File "/home/pi/repos/tvm/python/tvm/_ffi/_ctypes/function.py", line 185, in __call__
    ctypes.byref(ret_val), ctypes.byref(ret_tcode)))
  File "/home/pi/repos/tvm/python/tvm/_ffi/base.py", line 66, in check_call
    raise TVMError(py_str(_LIB.TVMGetLastError()))
tvm._ffi.base.TVMError: [23:08:51] /home/pi/repos/tvm/src/runtime/dso_module.cc:93: Check failed: lib_handle_ != nullptr Failed to load dynamic shared library /home/pi/workspace/tvm_resnet/config/resnet.so /home/pi/workspace/tvm_resnet/config/resnet.so: invalid ELF header

Stack trace returned 7 entries:
[bt] (0) /home/pi/repos/tvm/build/libtvm_runtime.so(dmlc::StackTrace[abi:cxx11]()+0x138) [0x7079ab2c]
[bt] (1) /home/pi/repos/tvm/build/libtvm_runtime.so(+0x26634) [0x707a9634]
[bt] (2) /home/pi/repos/tvm/build/libtvm_runtime.so(tvm::runtime::Module::LoadFromFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x420) [0x707b9e1c]
[bt] (3) /home/pi/repos/tvm/build/libtvm_runtime.so(+0x3a30c) [0x707bd30c]
[bt] (4) /home/pi/repos/tvm/build/libtvm_runtime.so(TVMFuncCall+0x40) [0x7079e210]
[bt] (5) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-arm-linux-gnueabihf.so(ffi_call_VFP+0x54) [0x767d5a90]
[bt] (6) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-arm-linux-gnueabihf.so(ffi_call+0x12c) [0x767d5528]

I paste the codes building the library, and running the inference with it, and also the error message.
Is there someone having seen and solving this problem??

the code for building the lib

block = get_model(cfg.model_name, pretrained=True)
net, params = nnvm.frontend.from_mxnet(block)
net = nnvm.sym.softmax(net)

arget_option = 'llvm -device=arm_cpu -mtriple=armv7l-none-linux-gnueabihf -mcpu=cortex-a53 -mattr=+neon'
target_host = 'llvm -device=arm_cpu -mtriple=armv7-none-linux-gnueabihf -mcpu=cortex-a53 -mattr=+neon'

target = tvm.target.create(target_option)

with nnvm.compiler.build_config(opt_level=3):
    graph, lib, params = nnvm.compiler.build(
        net, target, target_host=target_host, shape={"data": in_shape}, params=params, dtype="float32")

# generate lib instead of RPC
with open(cfg.json_path, "w") as fo:
with open(cfg.params_path, "wb") as fo:

the code for running the inference.

loaded_lib = tvm.module.load(cfg.lib_path)  #  <- failed
loaded_json = open(cfg.json_path).read()
loaded_params = bytearray(open(cfg.params_path, "rb").read())

ctx = tvm.cpu(0)
fcreate = tvm.get_global_func("tvm.graph_runtime.create")

gmodule = fcreate(loaded_json, loaded_lib, ctx.device_type, ctx.device_id)
set_input, get_output, run = gmodule["set_input"], gmodule["get_output"], gmodule["run"]
set_input("data", tvm.nd.array(x.astype('float32')))


out = tvm.nd.empty(cfg.out_shape)
get_output(0, out)


This seems to say that the generated binary is not arm readable, so I just tried to inspect its format by nm and paste the result below, which is a bit weird because there was only one uncertain symbol.

>> nm config/resnet.so
                 U dyld_stub_binder

Usually it’s like below (setting: target = ‘llvm’, target_host=‘llvm’)

>> nm config/resnet.so
0000000000031048 D ___TVMAPISetLastError
0000000000031058 D ___TVMBackendAllocWorkspace
0000000000031060 D ___TVMBackendFreeWorkspace
0000000000031050 D ___TVMBackendParallelLaunch
                 U ___bzero
0000000000030e4a S ___tvm_main__
                 U _expf
0000000000000e40 T _fuse___layout_transform___11
0000000000023600 T _fuse___layout_transform___flatten
000000000000a100 T _fuse__contrib_conv2d_NCHWc_broadcast_mul_broadcast_add
00000000000136f0 T _fuse__contrib_conv2d_NCHWc_broadcast_mul_broadcast_add_1
                 U _memcpy
                 U dyld_stub_binder


To export the .so library , you need to pass the cross-compiler , please refer to this question


Does your "cfg.lib_path" end with “.so” ?

Replacing it with “*.tar” can work.

You can use lib.export_library("*.tar") or lib.save("*.o")

Loading module Param and JSON file to statically linked C++ application

Thanks @kgomaa, for your reply and referece.
I quickly checked the relevant question, I found that’s actually what I really wanna know.
probably I should have searched this beforehand.
I still have very fundamental question that with RPC, why we don’t need to take a cross compiler path even though the way of RPC also needs to generate arm executable binary (ISA for arm).
If you know about this, please tell me it.

also thanks for your reply.
that means the build function or load changes the behavior like how to generate the binary depending on the extension??


Yes. In the tutorial (https://docs.tvm.ai/tutorials/nnvm/deploy_model_on_rasp.html#deploy-the-model-remotely-by-rpc) we use “*.tar”.

Using *.tar or *.o will delay the link to device. We will use the gcc on the pi, so we don’t need cross compiler.


OK I see, the tar thing, I’d like to try it later.
also I want to ask you that you comment means that the library generation step is done in also pi or your host computer having not arm CPU like x86 ??.


By using “.tar" or ".o”, we generate objective file “*.o” by llvm cross compilation but do not link them. llvm can generate arm code without cross compiler like aarch64-gcc.

Then we link them on the pi. Linking is cheap.


Finally the generated library with .tar works and also I understood the compile option stuff.
Many thanks!


If your target is rasp3b. Please use tvm.target.arm_cpu("rasp3b") or "llvm -device=arm_cpu -model=bcm2837 -target=armv7l-linux-gnueabihf -mattr=+neon" as your target.

We need -model=bcm2837 to match the parameters optimized for rasp3b.