Undefined symbol: VTAMemFree

First thanks for the great repo. I follow the toturail to compile tvm repo to support VTA accelerator. After the compilation, I tried to run matrix_multiply.py but I ran to the following error:

tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (8) /home/xilinx/tvm/build/libtvm.so(tvm::runtime::RPCSession::ServerLoop()+0x104) [0x7fb113099c]
  [bt] (7) /home/xilinx/tvm/build/libtvm.so(tvm::runtime::RPCSession::HandleUntilReturnEvent(tvm::runtime::TVMRetValue*, bool, tvm::runtime::PackedFunc const*)+0x1a0) [0x7fb1130678]
  [bt] (6) /home/xilinx/tvm/build/libtvm.so(tvm::runtime::RPCSession::EventHandler::HandleNextEvent(tvm::runtime::TVMRetValue*, bool, tvm::runtime::PackedFunc const*)+0x574) [0x7fb113682c]
  [bt] (5) /home/xilinx/tvm/build/libtvm.so(tvm::runtime::RPCSession::EventHandler::HandleRecvPackedSeqArg()+0x14c) [0x7fb1135c64]
  [bt] (4) /home/xilinx/tvm/build/libtvm.so(tvm::runtime::RPCSession::EventHandler::SwitchToState(tvm::runtime::RPCSession::EventHandler::State)+0x320) [0x7fb1134b70]
  [bt] (3) /home/xilinx/tvm/build/libtvm.so(tvm::runtime::RPCSession::EventHandler::HandlePackedCall()+0x678) [0x7fb112fdb0]
  [bt] (2) /home/xilinx/tvm/build/libtvm.so(void tvm::runtime::RPCSession::EventHandler::CallHandler<void (*)(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>(void (*)(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*))+0x7c) [0x7fb113445c]
  [bt] (1) /home/xilinx/tvm/build/libtvm.so(tvm::runtime::RPCModuleLoad(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)+0xb8) [0x7fb112ef78]
  [bt] (0) /home/xilinx/tvm/build/libtvm.so(+0xae91a0) [0x7fb10d01a0]
  File "/home/xilinx/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 78, in cfun
    rv = local_pyfunc(*pyargs)
  File "/home/xilinx/tvm/vta/python/vta/exec/rpc_server.py", line 59, in load_module
    load_vta_dll()
  File "/home/xilinx/tvm/vta/python/vta/exec/rpc_server.py", line 53, in load_vta_dll
    runtime_dll.append(ctypes.CDLL(dll_path, ctypes.RTLD_GLOBAL))
  File "/usr/lib/python3.6/ctypes/__init__.py", line 348, in __init__
    self._handle = _dlopen(self._name, mode)
TVMError: Except caught from RPC call: OSError: /home/xilinx/tvm/vta/python/vta/../../../build/libvta.so: undefined symbol: VTAMemFree

On the host side, I used LLVM 9.0.0, and for the FPGA, I follow the documentation here: https://docs.tvm.ai/vta/install.html I am not sure what is the cause of the problem. Any help would be greatly appreciated.

Hello, I’m trying VTA for xilinx too. I’m tried to reproduce, but got “Successful matrix multiply test!” message from “matrix_multiply.py”. I’m using LLVM 9.0.1 and Avnet ultra96-v2 device of Xilinx FPGA device.

My trial was:

  • Use commit a449d8b1fe (Feb 28 2020)
  • Build libraries for HOST according to https://docs.tvm.ai/install/from_source.html (0.7dev1)
  • Build libraries for PYNQ TARGET according to https://docs.tvm.ai/vta/install.html (0.7dev1)
  • Run “sudo ./apps/vta_rpc/start_rpc_server.sh” on PYNQ TARGET
  • Run “python3 vta/tutorials/matrix_multiply.py” on HOST, and got “Successful matrix multiply test!” message

By this step, “libtvm.so” library was NOT generated on PYNQ TARGET.

(Only 2 libraries “libvta.so” and “libtvm_runtime.so” were generated)

Could you let us know whith commit-version are you using?

Thanks for the reply. I should mention that I am using the ZCU102 FPGA platform. I have already built the VTA bitstream, and I already have PYNQ running on this platform. Based on your reply, I re-compile the vta both on the host and client-side. This time I have this error:

Traceback (most recent call last):

  File "~/vta/tutorials/matrix_multiply.py", line 403, in <module>
    f = remote.load_module("gemm.o")

  File "/tvm/python/tvm/rpc/client.py", line 148, in load_module
    return base._LoadRemoteModule(self._sess, path)

  File "~/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 213, in __call__
    raise get_last_ffi_error()

tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (8) /home/xilinx/tvm/build/libtvm_runtime.so(tvm::runtime::RPCSession::ServerLoop()+0x104) [0x7f986ad214]
  [bt] (7) /home/xilinx/tvm/build/libtvm_runtime.so(tvm::runtime::RPCSession::HandleUntilReturnEvent(tvm::runtime::TVMRetValue*, bool, tvm::runtime::PackedFunc const*)+0x1a0) [0x7f986acef0]
  [bt] (6) /home/xilinx/tvm/build/libtvm_runtime.so(tvm::runtime::RPCSession::EventHandler::HandleNextEvent(tvm::runtime::TVMRetValue*, bool, tvm::runtime::PackedFunc const*)+0x574) [0x7f986b3bfc]
  [bt] (5) /home/xilinx/tvm/build/libtvm_runtime.so(tvm::runtime::RPCSession::EventHandler::HandleRecvPackedSeqArg()+0x14c) [0x7f986b3034]
  [bt] (4) /home/xilinx/tvm/build/libtvm_runtime.so(tvm::runtime::RPCSession::EventHandler::SwitchToState(tvm::runtime::RPCSession::EventHandler::State)+0x320) [0x7f986b1f40]
  [bt] (3) /home/xilinx/tvm/build/libtvm_runtime.so(tvm::runtime::RPCSession::EventHandler::HandlePackedCall()+0x678) [0x7f986ac628]
  [bt] (2) /home/xilinx/tvm/build/libtvm_runtime.so(void tvm::runtime::RPCSession::EventHandler::CallHandler<void (*)(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>(void (*)(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*))+0x7c) [0x7f986b182c]
  [bt] (1) /home/xilinx/tvm/build/libtvm_runtime.so(tvm::runtime::RPCModuleLoad(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)+0xb8) [0x7f986ab7f0]
  [bt] (0) /home/xilinx/tvm/build/libtvm_runtime.so(+0x2bd00) [0x7f98637d00]
  File "/home/xilinx/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 78, in cfun
    rv = local_pyfunc(*pyargs)
  File "/home/xilinx/tvm/vta/python/vta/exec/rpc_server.py", line 60, in load_module
    return _load_module(file_name)
  File "/home/xilinx/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 213, in __call__
    raise get_last_ffi_error()
  [bt] (1) /home/xilinx/tvm/build/libtvm_runtime.so(TVMFuncCall+0x70) [0x7f9863be20]
  [bt] (0) /home/xilinx/tvm/build/libtvm_runtime.so(+0x2bd00) [0x7f98637d00]
  File "/home/xilinx/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 78, in cfun
    rv = local_pyfunc(*pyargs)
  File "/home/xilinx/tvm/python/tvm/rpc/server.py", line 67, in load_module
    m = _load_module(path)
  File "/home/xilinx/tvm/python/tvm/runtime/module.py", line 388, in load_module
    _cc.create_shared(path + ".so", path)
  File "/home/xilinx/tvm/python/tvm/contrib/cc.py", line 48, in create_shared
    _linux_compile(output, objects, options, cc)
  File "/home/xilinx/tvm/python/tvm/contrib/cc.py", line 182, in _linux_compile
    raise RuntimeError(msg)
TVMError: Except caught from RPC call: RuntimeError: Compilation error:
/tmp/tmpapw17oo3/gemm.o: error adding symbols: File in wrong format
collect2: error: ld returned 1 exit status

This time I tried both llvm-9 and llvm-4 but no success. Any help would be appreciated.

1 Like

Thank you for your information.

I checked out latest code and run recompilation, but couldn’t have met the issue…

Your log says “gemm.o has wrong format” , then could you check compiled “gemm.o”?

In my environment, I tried with modification below,

kuroy@kuroy-VirtualBox:~/git/tvm$ git log|head -n 3
commit d992468d80af816f0413fc43c2ee1c02f7fe19c3
Author: Yao Wang <kevinthesunwy@gmail.com>
Date:   Thu Mar 5 17:17:35 2020 -0800

kuroy@kuroy-VirtualBox:~/git/tvm$ git diff
diff --git a/vta/config/vta_config.json b/vta/config/vta_config.json
index 0591bb486..013420cff 100644
--- a/vta/config/vta_config.json
+++ b/vta/config/vta_config.json
@@ -1,5 +1,5 @@
 {
-  "TARGET" : "sim",
+  "TARGET" : "ultra96",
   "HW_VER" : "0.0.1",
   "LOG_INP_WIDTH" : 3,

diff --git a/vta/tutorials/matrix_multiply.py b/vta/tutorials/matrix_multiply.py
index 444762684..54e9bc5fd 100644
--- a/vta/tutorials/matrix_multiply.py
+++ b/vta/tutorials/matrix_multiply.py
@@ -52,7 +52,7 @@ port = int(os.environ.get("VTA_PYNQ_RPC_PORT", "9091"))
 
 # We configure both the bitstream and the runtime system on the Pynq
 # to match the VTA configuration specified by the vta_config.json file.
-if env.TARGET == "pynq":
+if env.TARGET == "pynq" or env.TARGET == "ultra96":
 
     # Make sure that TVM was compiled with RPC=1
     assert tvm.runtime.enabled("rpc")
@@ -395,6 +395,7 @@ my_gemm = vta.build(s, [A, B, C], "ext_dev", env.target_host, name="my_gemm")
 # Write the compiled module into an object file.
 temp = util.tempdir()
 my_gemm.save(temp.relpath("gemm.o"))
+my_gemm.save("./gemm.o")

By the modification on matrix_multiply.py L395, gemm.o was saved on current directory after calling “python3 matrix_multiply.py”.

My “gemm.o” 's format was below.

ZCU102 and Ultra96v2 has ARM 64bit core, so I think you have to have similar AAch64 file format.

kuroy@kuroy-VirtualBox:~/git/tvm$ readelf gemm.o -h
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              REL (Relocatable file)
  Machine:                           AArch64
  Version:                           0x1
  Entry point address:               0x0
  Start of program headers:          0 (bytes into file)
  Start of section headers:          12160 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           0 (bytes)
  Number of program headers:         0
  Size of section headers:           64 (bytes)
  Number of section headers:         20
  Section header string table index: 1

BTW, how about PYNQ side log?

  • Which VTA_TARGET was set?
  • No problem was occurred before loading “gemm.o”?

PYNQ side log of mine is below.

xilinx@pynq:~/tvm$ sudo ./apps/vta_rpc/start_rpc_server.sh
INFO:RPCServer:bind to 0.0.0.0:9091
INFO:RPCServer:connection from ('192.168.174.8', 50770)
INFO:root:Rebuild runtime:
 output=/home/xilinx/tvm/vta/python/vta/../../../build/libvta.so,
 cflags=-O2
        -std=c++11
        -I/home/xilinx/tvm/include
        -I/home/xilinx/tvm/vta/include
        -I/home/xilinx/tvm/3rdparty/dlpack/include
        -I/home/xilinx/tvm/3rdparty/dmlc-core/include
        -DVTA_TARGET=ultra96
        -DVTA_HW_VER=0.0.1
        -DVTA_LOG_INP_WIDTH=3
        -DVTA_LOG_WGT_WIDTH=3
        -DVTA_LOG_ACC_WIDTH=5
        -DVTA_LOG_BATCH=0
        -DVTA_LOG_BLOCK=4
        -DVTA_LOG_UOP_BUFF_SIZE=15
        -DVTA_LOG_INP_BUFF_SIZE=15
        -DVTA_LOG_WGT_BUFF_SIZE=18
        -DVTA_LOG_ACC_BUFF_SIZE=17
        -DVTA_LOG_BLOCK_IN=4
        -DVTA_LOG_BLOCK_OUT=4
        -DVTA_LOG_OUT_WIDTH=3
        -DVTA_LOG_OUT_BUFF_SIZE=15
        -DVTA_LOG_BUS_WIDTH=7
        -DVTA_IP_REG_MAP_RANGE=0x1000
        -DVTA_FETCH_ADDR=0xA0000000
        -DVTA_LOAD_ADDR=0xA0001000
        -DVTA_COMPUTE_ADDR=0xA0002000
        -DVTA_STORE_ADDR=0xA0003000
        -DVTA_FETCH_INSN_COUNT_OFFSET=16
        -DVTA_FETCH_INSN_ADDR_OFFSET=24
        -DVTA_LOAD_INP_ADDR_OFFSET=16
        -DVTA_LOAD_WGT_ADDR_OFFSET=24
        -DVTA_COMPUTE_DONE_WR_OFFSET=16
        -DVTA_COMPUTE_DONE_RD_OFFSET=24
        -DVTA_COMPUTE_UOP_ADDR_OFFSET=32
        -DVTA_COMPUTE_BIAS_ADDR_OFFSET=40
        -DVTA_STORE_OUT_ADDR_OFFSET=16
        -DVTA_COHERENT_ACCESSES=true,
 source=/home/xilinx/tvm/vta/src/device_api.cc
        /home/xilinx/tvm/vta/src/runtime.cc
        /home/xilinx/tvm/vta/src/pynq/pynq_driver.cc,
 ldflags=-L/usr/lib
        -l:libcma.so
INFO:root:Program FPGA with 1x16_i8w8a32_15_15_18_17.bit
INFO:root:Loading VTA library: /home/xilinx/tvm/vta/python/vta/../../../build/libvta.so
INFO:RPCServer:load_module /tmp/tmpsv64dqfq/gemm.o

Thanks for the replay. It is interesting. It seems my machine is generating ELF32 format which is wrong:

ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              REL (Relocatable file)
  Machine:                           ARM
  Version:                           0x1
  Entry point address:               0x0
  Start of program headers:          0 (bytes into file)
  Start of section headers:          10020 (bytes into file)
  Flags:                             0x5000000, Version5 EABI
  Size of this header:               52 (bytes)
  Size of program headers:           0 (bytes)
  Number of program headers:         0
  Size of section headers:           40 (bytes)
  Number of section headers:         21
  Section header string table index: 1

Here is also the log from Xilinx side:

INFO:RPCServer:bind to 0.0.0.0:9091
INFO:RPCServer:connection from ('10.16.26.186', 43262)
INFO:root:Rebuild runtime:
 output=/home/xilinx/tvm/vta/python/vta/../../../build/libvta.so,
 cflags=-O2
	-std=c++11
	-I/home/xilinx/tvm/include
	-I/home/xilinx/tvm/vta/include
	-I/home/xilinx/tvm/3rdparty/dlpack/include
	-I/home/xilinx/tvm/3rdparty/dmlc-core/include
	-DVTA_TARGET=ultra96
	-DVTA_HW_VER=0.0.1
	-DVTA_LOG_INP_WIDTH=3
	-DVTA_LOG_WGT_WIDTH=3
	-DVTA_LOG_ACC_WIDTH=5
	-DVTA_LOG_BATCH=0
	-DVTA_LOG_BLOCK=4
	-DVTA_LOG_UOP_BUFF_SIZE=15
	-DVTA_LOG_INP_BUFF_SIZE=15
	-DVTA_LOG_WGT_BUFF_SIZE=18
	-DVTA_LOG_ACC_BUFF_SIZE=17
	-DVTA_LOG_BLOCK_IN=4
	-DVTA_LOG_BLOCK_OUT=4
	-DVTA_LOG_OUT_WIDTH=3
	-DVTA_LOG_OUT_BUFF_SIZE=15
	-DVTA_LOG_BUS_WIDTH=7
	-DVTA_IP_REG_MAP_RANGE=0x1000
	-DVTA_FETCH_ADDR=0xA0000000
	-DVTA_LOAD_ADDR=0xA0001000
	-DVTA_COMPUTE_ADDR=0xA0002000
	-DVTA_STORE_ADDR=0xA0003000
	-DVTA_FETCH_INSN_COUNT_OFFSET=16
	-DVTA_FETCH_INSN_ADDR_OFFSET=24
	-DVTA_LOAD_INP_ADDR_OFFSET=16
	-DVTA_LOAD_WGT_ADDR_OFFSET=24
	-DVTA_COMPUTE_DONE_WR_OFFSET=16
	-DVTA_COMPUTE_DONE_RD_OFFSET=24
	-DVTA_COMPUTE_UOP_ADDR_OFFSET=32
	-DVTA_COMPUTE_BIAS_ADDR_OFFSET=40
	-DVTA_STORE_OUT_ADDR_OFFSET=16
	-DVTA_COHERENT_ACCESSES=true,
 source=/home/xilinx/tvm/vta/src/device_api.cc
	/home/xilinx/tvm/vta/src/runtime.cc
	/home/xilinx/tvm/vta/src/pynq/pynq_driver.cc,
 ldflags=-L/usr/lib
	-l:libcma.so
INFO:root:Program FPGA with vta.bit 
INFO:root:Loading VTA library: /home/xilinx/tvm/vta/python/vta/../../../build/libvta.so
INFO:RPCServer:Finish serving ('10.16.26.186', 43262)

I am wondering how I can solve the ELF format problem.

I fixed the problem. Now it is stucked at:

// attr [iter_var(vta, , vta)] coproc_scope = 3
vta.coproc_dep_pop(2, 3)
produce C {
  VTAStoreBuffer2D(tvm_thread_context(VTATLSCommandHandle()), 0, 4, C, 0, 16, 1, 16)
}
vta.coproc_sync()

It seems it is related to cache coherency for Ultra MPSoC family.

Hi, @mrb256 ,

I’m grad to hear you solved the problem. Nice work!

But I couldn’t find where you modified in your last post, by searching my git-based code with “VTAStoreBuffer2D” or other keywords, so could you let me know the version of TVM you’re using?

I met problem when I use VTALoadBuffer2D() on Ultra MPSoC + cache coherency modified PYNQ image , so I’m very interested in your modification.

  • My problem is depends on Ultra MPSoC ? Or code of TVM?
  • Cache coherency related issue will occur on other API, like VTALoadBuffer2D() or so?

I’m very grad if you comment me on my thread. (Comment your result of “test_vta_insn.py” + “test_benchmark_topi_conv2d_transpose.py” on ZCU102 will be very helpful!)

Hello @mrb256, How did you solve the problem?

I have the same with wrong ELF format of gemm.o (32bit) generated for zcu102 target.