Standardize GraphRuntime Exports into a Single DLL

Standardize NN Module Exports into a Single DLL

The TVM runtime module defines an unified API for serialization. This API helps us to export modules that contains multiple component. This RFC proposes to use the same mechanism for graph runtime exportation.

Background about Module Serialization

Let us first walk through the current module serialization mechanism.

llvm_mod:imported_modules
  - cuda_mod
  - opencl_mod

llvm_mod.export_library for the module above are implemented in the following steps(pseudo code).

llvm_mod.save("function_objs.o")
# call ModuleNode.SaveToBinary to serialize the blobs
blob0 = cuda_mod.SaveToBinary()
blob1 = opencl_mod.SaveToBinary()
combined_blob = concat(["cuda", blob0, "opencl", blob1]
# Hack to move binary blobs into a c file
# so that it can be embedded into a DLL
PackToC(combined_blob, "blob_pack.cc")
run("cc -shared -o lib.so function_objs.o blob_pack.cc")

Summary of the steps:

  • If the module contains normal llvm or c functions, we can just save it to an object file
  • For other modules, we implement customized serialization(in SaveToBinary). We call the SaveToBinary function in each module to serialize them to binary blobs.
  • Finally, we pack the binary blob into a symbol that can be recovered in the dynamic libary (see https://github.com/apache/incubator-tvm/blob/master/src/codegen/codegen.cc#L52)

When we call module.load("lib.so"), we will load the dynamic library, and then inspect the special binary blob symbol. If the blob symbol exists, we deserialize the modules from the blob by to calling the corresponding load function(registered in the global registry). Then recover them into the imported_modules field of the DSOModule. (see https://github.com/apache/incubator-tvm/blob/master/src/runtime/module_util.cc#L38)

Generalize the Mechanism for Graph runtime and VM.

The above mechanism is great for packaging a TVM generated module that imports multiple device moduels. We can serialize arbitary modules(CUDA, or user defined ones) into a single DLL without worrying about how load them separately and link them together. However, when it comes to deploying graph runtime(which is also a Module!), we take a different approach.


model = relay.frontend.from_xyz()
# the corresponding library is already part of the graph_rt_mod
graph_json, lib, params = relay.build()
lib.export_library("xyz.so")

exec_lib = tvm.module.load("xyz.so")
exec_graph_mod = graph_runtime.create(graph_json, lib)

In the above code, the constructor of graph runtime takes two arguments: json(meta data) and a libary(supported low level functions). We pass these two arguments to graph_runtime.create in order to get a graph runtime executor. This convention works great for us so far. However, we never-the-less treated graph runtime differently from other runtime modules. As we started to introduce new NN runtime such as relay VM, it is hard to keep track of different export conventions.

This RFC proposes to standardize export of all modules, including the graph runtime and vm, into a single shared library. Specifically, we want to support the following new code:


model = relay.frontend.from_xyz()
# the corresponding library is already part of the graph_rt_mod
compiled_graph_mod, params = relay.build()
# both works
compiled_graph_mod.export_library("xyz.so")
compiled_graph_mod.export_library("xyz.tar")
# load the graph runtime.
exec_graph_mod = tvm.module.load("xyz.so")

The import relation of the compiled_graph_mod is shown as follows:

compiled_graph_mod:imported_modules
- llvm_mod:imported_modules
   - cuda_mod
   - opencl_mod

Instead of passing back a separate lib, we simply store the dependent library in the imported_modules field of the graph runtime.

Enhanced Serialization Convention

We only have to do a bit enhancement to the original serialization convention to support the above code. We can still use SaveToBinary to serialize compiled_graph_mod. However, when we read it back, we need to return the compiled_graph_mod as the primary module and put the dso module under its imported list.

In order to do so, we need to also serialize the import hierachy of the module. The new serialization blob format can be defined in a backward compatible way by appending a key “_import_tree” in the end and stores the import relation accordingly. We will fallback to the old behavior if “_import_tree” is not available.

We should be able to serialize arbitary set of modules in this way. However, we can only have one dso module(e.g. the source module) in the hierachy due to the restriction of a single dll(which works for our current usecases).

Summary

In this RFC, we described a way to unify all the runtime module exports. We believe such standardization will bring long term benefits, especially as we start to introduce new NN runtimes such as VM.

This RFC does not however, advocate to use shared library for all module exports. In some cases it is still helpful to separate the generate library from the meta-data. We will still keep the graph_runtime.create interface for those use cases, and allow user to call graph_mod.save_graph_json(“xyz.json”) to save the compiled result.

1 Like

To unify the export interface at such a time point is a good design taste. One small question: In the RFC, it looks that the interface of relay.build() will be changed a little from return graph_json, lib, params to return compiled_graph_mode, params

It may looks like a breaking change. Will we keep the original tvm.build() interface also?

The interface change and the dll integration can happen at two steps.

Given to the change happens at the return value, we will likely have to break the original interface to do so, but will do an RFC discuss among the community.

@yangjunpro I think the original tvm.build() interface can remain the same. The main difference will be the way how we deploy the compiled model as the artifacts would change if we discard json.

@tqchen

for

compiled_graph_mod:imported_modules
- llvm_mod:imported_modules
   - cuda_mod
   - opencl_mod

Should it be the following? I meant: is llvm_mod:imported_modules one of the modules of compiled_graph_mod?

compiled_graph_mod:imported_modules
    - llvm_mod:imported_modules
        - cuda_mod
        - opencl_mod

Unifying is good. though we should modify many infrastructure places like SaveToBinary, dso_module loadfile_so. However, I think it is smoothly in fact and there is no blocking things as far as I can see.

For discarding json, in fact, I think it is preferable. Sometimes I am asked the security of deployment by others, I explained the .json doesn’t expose the network, but our function name will still expose some network operator information like fuse_conv2d_relu. So I said we could encrypt the .json. But if we could pack it into binary like we pack CUDA / OpenCL kernel code, this problem could be solved better, I think it is a good way to go and bring us better deployment experience.

1 Like