How to integrate a target specific library for codegen?

joyalbin · April 23, 2020, 9:50am

Dear All, I am quite lost on the flow of Codegen and lowering and not able to figure out way of integrating a target cross compiled external library to codegen to generate module (model) library?

I have a target specific cross compile library which contain the NN operator implementations optimized for the target and that to be integrate to the arm target.

Please help me…

comaniac · April 23, 2020, 4:27pm

You can try to follow this tutorial to build a codegen: https://docs.tvm.ai/dev/relay_bring_your_own_codegen.html

joyalbin · April 24, 2020, 1:16am

@comaniac Thank you for response. To achieve this codegen, the external library must be able to compile with GCC or the cross compiled library can do this?

comaniac · April 24, 2020, 1:29am

This flow fits both cases. See the description of 2. You want to generate any other graph representations.

joyalbin · April 24, 2020, 5:39am

@comaniac, help me clarify one doubt, I have gone through the tutorial. in my understanding, it covers two part.

We can integrate a GCC compiled external library for utilising an external operator implementation like rocblas.
we can integrate a new runtime which basically have new graph representation like TensorRT. Please correct me if I missed anything.

But in my case, I have only the optimised operator implementation in the external library which is cross compiled for my target(not compatible to compile on GCC). It doesnot contain any runtime.

I tried the followings:

I tried like ROCBLAS integration. If the library is GCC compiled, I can link the library to TVM and call the library API from operator as an external function as mentioned in src/runtime/contrib/rocblas But in my case, i have GCC incompatible library So the linking will not happen when TVM source code building.
Basically I have to call operator implementation in my library (cross compiled) when the model is compiling for arm. But I am missing hook where and how to integrate this library to happen this interfacing internally at model compilation.

This case also can be covered in the flow https://docs.tvm.ai/dev/relay_bring_your_own_codegen.html ?

Regards, Albin

comaniac · April 24, 2020, 7:05pm

In this case you need a customized runtime. The flow would be like the following:

The codegen accepts a subgraph and generate the code that invokes functions in your library.
When exporting the TVM built model library, the SaveToBinary in your codegen will be called, and you need to cross-compile the generated code at this moment.
In runtime, TVM will invoke you customized runtime when seeing an external function call. Your customized runtime is not necessary to be a graph execution engine. It can be just a function dispatcher. For example, it first loads the binary compiled in the previous step and establishes a map such as subgraph1 -> function pointer. As a result, when TVM asks your runtime to run subgraph1, your runtime identifies the corresponding function pointer and invokes the function.

joyalbin · April 25, 2020, 5:32pm

@comaniac Still the operator computes have to call the library functions right? at the TVM compilation
time, libtvm.so will not create dependency with the arm optimised library because the library is not GCC/linux compatible. Could you please help me to understand, How can we avoid this issue?

comaniac · April 25, 2020, 5:53pm

You can generate code and write it to a text file, so that you can leverage a system call to compile it. Since the compilation is a separate process, it doesn’t have to be built with TVM.

joyalbin · April 28, 2020, 11:48am

@comaniac I have analysed the code more and tried to prototype the solution. I followed the below process.

I used a packed function registered with TVM_REGISTER_GLOBAL to call me library API. For this I created a temporary library which compatible with Linux and placed dummy API functions to make the libtvm.so build get success and symbols to be resolved.
My plan is to link the model cross compiled library function symbol at runtime like with the actual arm compatible library.
But I am acing one problem: in the model library, the symbol visible is the external function (arm library API is not visible in the symbol table) called from the operator but the arm library have normal API. So the direct linking is not possible, can you please suggest me a solution here?
If I have to link the arm library at the model library cross compilation time(relay.build), is this possible?

comaniac · April 28, 2020, 6:17pm

When calling export library, the C code generated by your codegen will be compiled with the rest of parts. You can pass the library linking options like:

github.com

apache/incubator-tvm/blob/master/tests/python/relay/test_pass_partition_graph.py#L186


if sys.platform == "win32":
    print("Skip test on Windows for now")
    return


def update_lib(lib):
    test_dir = os.path.dirname(os.path.realpath(os.path.expanduser(__file__)))
    source_dir = os.path.join(test_dir, "..", "..", "..")
    contrib_path = os.path.join(source_dir, "src", "runtime", "contrib")


    kwargs = {}
    kwargs["options"] = ["-O2", "-std=c++14", "-I" + contrib_path]
    tmp_path = util.tempdir()
    lib_name = 'lib.so'
    lib_path = tmp_path.relpath(lib_name)
    lib.export_library(lib_path, fcompile=False, **kwargs)
    lib = runtime.load_module(lib_path)


    return lib


def check_vm_result():
    compile_engine.get().clear()

In this case, you need to build TVM with the ARM library, and you can achieve this by adding the dependency to the CMake configs. You can refer to the DNNL codegen example we put and see how the MKL-DNN dependency was added.

Alternatively, you can generate a graph representation (e.g., in JSON format) in your codegen instead of C code. In this case, you implement a customized runtime to accept the graph and call APIs in the ARM library. Of course, the runtime has to be compiled with the ARM library and you still need to add the dependency to the CMake configs.

joyalbin · April 29, 2020, 2:08am

Few queries,

To build the TVM with ARM library, we have to build the entire TVM source code with ARM tool chain right?
The DNNL support added in TVM for a cross compiled DNNL library? I have gone through the source code, the code is not using any TVM external function TVM_REGISTER_GLOBAL method. And the CMAKE dependency you are talking is cmake/modules/contrib/DNNL.cmake right?
For the alternative solution you are suggesting, can you please give me an example for generating graph representation in codegen instead of C, if its available?

comaniac · April 29, 2020, 6:12am

To build the TVM with ARM library, we have to build the entire TVM source code with ARM tool chain right?

Simply speaking, yes.

The DNNL support added in TVM for a cross compiled DNNL library? I have gone through the source code, the code is not using any TVM external function TVM_REGISTER_GLOBAL method.

DNNL support doesn’t use TVM_REGISTER_GLOBAL since we didn’t implement one-to-one op mapping like other contribs such as cBLAS/cuBLAS/cuDNN. Other contrib register their APIs to corresponding TVM ops and they will be directly invoked when a user specifies the target (e.g., -lib=cblas). DNNL codegen example, on the other hand, generates API calls by traversing subgraphs. In this case, we only need to include DNNL library when we need to compile the generated code:

github.com

apache/incubator-tvm/blob/master/src/runtime/contrib/dnnl/dnnl_kernel.h#L29




/*!
 * \file src/runtime/contrib/dnnl/dnnl_kernel.h
 * \brief Use external dnnl library kernels.
 */


#ifndef TVM_RUNTIME_CONTRIB_DNNL_DNNL_KERNEL_H_
#define TVM_RUNTIME_CONTRIB_DNNL_DNNL_KERNEL_H_


#include <tvm/runtime/c_runtime_api.h>
#include "dnnl.hpp"


namespace tvm {
namespace runtime {
namespace contrib {


using namespace dnnl;


extern "C" TVM_DLL void dnnl_conv2d(float* data, float* weights, float* out, int p_N_, int p_C_,
                                    int p_H_, int p_W_, int p_O_, int p_G_, int p_Ph_, int p_Pw_,
                                    int p_Kh_, int p_Kw_, int p_Sh_, int p_Sw_);

And the CMAKE dependency you are talking is cmake/modules/contrib/DNNL.cmake right?

Yes.

For the alternative solution you are suggesting, can you please give me an example for generating graph representation in codegen instead of C, if its available?

@zhiics and I are working on a general JSON codegen with a DNNL runtime engine as a more practical example. Hopefully we can file a PR next week.

joyalbin · April 29, 2020, 2:08pm

In the DNNL solution, TVM expect to link the DNNL library at TVM load time (when we do “import tvm”) with the cmake change:

  find_library(EXTERN_LIBRARY_DNNL dnnl)
  list(APPEND TVM_RUNTIME_LINKER_LIBS ${EXTERN_LIBRARY_DNNL})

But in my case, I doesnot have this Linux compatible library to link with TVM at this place.

Is this possible by following DNNL flow and making below changes to overcome my situation?

I create a dummy Linux library (all the Library APIs with dummy definitions) and use for TVM source code compilation and linking.
At the time for “relay.build” and codegen, at the “export library”, as shown in “update_lib”, I will try to link original ARM library and map the Library API calls. Does this change need runtime modifications?
With the above solution doesnot need TVM source code to be compiled with ARM tool chain.
Just for confirmation, as of now there is no solution integrated in TVM with a dependency the entire TVM source code to be cross compiled, right?
Will the above method or the alternative solution you suggested (make a TXT/JSON graph representation in codegen instead of C and a customised runtime to map the function with ARM Library API) more realistic?

ramana-arm · April 29, 2020, 4:26pm

If this library is Arm Compute Library, we in Arm are working on a proof of concept integration into the BYOC codegen pretty much with a json graph representation in codegen instead of using the C runtime.

Ramana

joyalbin · April 30, 2020, 5:02am

@ramana-arm I am basically trying this for a research purpose. Could you please let me know, when this changes will be ready and uploaded?

@comaniac Could you please help me to clarify my queries.