[VTA][Relay] Deploy Relay model using VTA on FPGA target

Hello all. I’m trying to write and run tiny yolo relay model using VTA on FPGA target.
However I encountered these errors when building the model.
So I would like to ask certain questions.
Questions:

  1. Can graphs written with relay work with VTA like NNVM graph?
  2. What’s the error caused by?
  3. Is there a good debugging method for tracking errors?
Traceback (most recent call last):
  File "tinyyolo_quantized.py", line 502, in <module>
    graph, lib, params = generate_graph(device)
  File "tinyyolo_quantized.py", line 71, in generate_graph
    graph, lib, params = relay.build(func, target=target, target_host=target_host, params=params)
  File "/tvm/python/tvm/relay/build_module.py", line 275, in build
    lowered_funcs, target=target, target_host=target_host)
  File "/tvm/python/tvm/build_module.py", line 607, in build
    mhost = codegen.build_module(fhost_all, str(target_host))
  File "/tvm/python/tvm/codegen.py", line 20, in build_module
    return _Build(lowered_func, target)
  File "/tvm/python/tvm/_ffi/_ctypes/function.py", line 185, in __call__
    ctypes.byref(ret_val), ctypes.byref(ret_tcode)))
  File "/tvm/python/tvm/_ffi/base.py", line 71, in check_call
    raise TVMError(py_str(_LIB.TVMGetLastError()))
tvm._ffi.base.TVMError: [15:55:42] /tvm/src/codegen/llvm/llvm_module.cc:173: LLVM module verification failed with the following errors:
Call parameter type does not match function signature!
  %3 = alloca i32, align 4
 i8*  %31 = call i8* @VTABufferCPUPtr(i8* %14, i32* nonnull %3)
Call parameter type does not match function signature!
  %3 = alloca i32, align 4
 i8*  %31 = call i8* @VTABufferCPUPtr(i8* %14, i32* nonnull %3)
Call parameter type does not match function signature!
  %3 = alloca float, align 4
 i8*  %31 = call i8* @VTABufferCPUPtr(i8* %14, float* nonnull %3)
Call parameter type does not match function signature!
  %3 = alloca i32, align 4
 i8*  %31 = call i8* @VTABufferCPUPtr(i8* %14, i32* nonnull %3)
Call parameter type does not match function signature!
  %3 = alloca i32, align 4
 i8*  %31 = call i8* @VTABufferCPUPtr(i8* %14, i32* nonnull %3)
Call parameter type does not match function signature!
  %3 = alloca i32, align 4
 i8*  %31 = call i8* @VTABufferCPUPtr(i8* %14, i32* nonnull %3)
Call parameter type does not match function signature!
  %3 = alloca i32, align 4
 i8*  %31 = call i8* @VTABufferCPUPtr(i8* %14, i32* nonnull %3)
Call parameter type does not match function signature!
  %3 = alloca i32, align 4
 i8*  %31 = call i8* @VTABufferCPUPtr(i8* %14, i32* nonnull %3)
Call parameter type does not match function signature!
  %3 = alloca i32, align 4
 i8*  %31 = call i8* @VTABufferCPUPtr(i8* %14, i32* nonnull %3)


Stack trace returned 9 entries:
[bt] (0) 0   libtvm.dylib                        0x0000000114d9be70 dmlc::StackTrace(unsigned long) + 464
[bt] (1) 1   libtvm.dylib                        0x0000000114d9bb54 dmlc::LogMessageFatal::~LogMessageFatal() + 52
[bt] (2) 2   libtvm.dylib                        0x00000001155f9522 tvm::codegen::LLVMModuleNode::Init(tvm::Array<tvm::LoweredFunc, void> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) + 1922
[bt] (3) 3   libtvm.dylib                        0x00000001155f8bb0 std::__1::__function::__func<tvm::codegen::$_1, std::__1::allocator<tvm::codegen::$_1>, void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&) + 544
[bt] (4) 4   libtvm.dylib                        0x0000000114e6dd2f tvm::codegen::Build(tvm::Array<tvm::LoweredFunc, void> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 559
[bt] (5) 5   libtvm.dylib                        0x0000000114da7023 std::__1::__function::__func<tvm::codegen::$_0, std::__1::allocator<tvm::codegen::$_0>, void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&) + 483
[bt] (6) 6   libtvm.dylib                        0x00000001156097f6 TVMFuncCall + 70
[bt] (7) 7   _ctypes.cpython-36m-darwin.so       0x0000000110fc6e5f ffi_call_unix64 + 79
[bt] (8) 8   ???                                 0x00007ffee5edc0e0 0x0 + 140732755984608

Really appreciate any comments or suggestions.

I think vta does not work with relay so far. vta adds some custom passes into nnvm. @thierry correct if I’m wrong.

Thank you for your reply. I see.
I understand custom passes are applied via vta.build_config.
I tried to read into the code but I could not find the part that they are calling NNVM pass.
If you don’t mind, could you let me know which part of the code is depend on NNVM?

I’ll try to answer and hope that if I am wrong someone will correct me

You are right, vta.build_config defines the list of passes that will be appended to the lowering process.
This are applied after the HW independent graph optimizations of NNVM.

I think what is meant here are actually other passes.
Namely, those in vta.graph.
These are passes which are applied to the graph representation of the workload before the actual NNVM graph optimizations are done.
You can see this in the resnet tutorial of VTA.
To my understanding without these preprocessing passes, nothing will get offloaded to the FPGA fabric of the VTA.

Thank you for your kind reply.
vta.graph.pack() is the graph rewriting process to get offloading to the FPGA and the process uses nnvm, is that right?

I actually do not have the best understanding of what the functions of vta.graph do.
Refer to my post [VTA&TVM] Questions after investigating resnet.py tutorial to see my general understanding and feel free to correct me if I am wrong.

I wouldn’t really put it that way (again see my other thread for more details).
It is sadly a little more complicated.

vta.graph.pack() does some memory layout transformations (especially for the conv2d layers) so that they match the expected memory layout the VTA expects.
(BTW The transformations are done via nnvm operations).
You can tell that this is a requirement to offload to the FPGA fabric because of the following code snippet


which basically means “the packed_conv2d computing rule is only valid if the layout is in the batch&channel packed format”.

Thank you for your reply. It helps me understand VTA.
I’ll read your post [VTA&TVM] Questions after investigating resnet.py tutorial carefully, and let’s continue discussing about VTA.

@yzhliu @aca88 @makihiro Hi, I was wondering if you implemented nn.upsample in the graph_pack process. At present, I am trying to implement Unet through VTA, but I met some problems in the graph_pack process. I suspect the reason is nn.upsample or torch.cat. A full description can be found in another post I wrote in Can Upsample be implemented on VTA in graph_pack?. I’m sorry to bother you now. Since I just got in touch with TVM and VTA, I have encountered this problem, but it has not been solved after a long time of trying. I would like to ask if you have encountered this problem or can you provide some suggestions? I look forward to receiving your suggestions. Thank you!