Standardize NN Module Exports into a Single DLL
The TVM runtime module defines an unified API for serialization. This API helps us to export modules that contains multiple component. This RFC proposes to use the same mechanism for graph runtime exportation.
Background about Module Serialization
Let us first walk through the current module serialization mechanism.
llvm_mod:imported_modules - cuda_mod - opencl_mod
llvm_mod.export_library for the module above are implemented in the following steps(pseudo code).
llvm_mod.save("function_objs.o") # call ModuleNode.SaveToBinary to serialize the blobs blob0 = cuda_mod.SaveToBinary() blob1 = opencl_mod.SaveToBinary() combined_blob = concat(["cuda", blob0, "opencl", blob1] # Hack to move binary blobs into a c file # so that it can be embedded into a DLL PackToC(combined_blob, "blob_pack.cc") run("cc -shared -o lib.so function_objs.o blob_pack.cc")
Summary of the steps:
- If the module contains normal llvm or c functions, we can just save it to an object file
- For other modules, we implement customized serialization(in SaveToBinary). We call the SaveToBinary function in each module to serialize them to binary blobs.
- Finally, we pack the binary blob into a symbol that can be recovered in the dynamic libary (see https://github.com/apache/incubator-tvm/blob/master/src/codegen/codegen.cc#L52)
When we call
module.load("lib.so"), we will load the dynamic library, and then inspect the special binary blob symbol. If the blob symbol exists, we deserialize the modules from the blob by to calling the corresponding load function(registered in the global registry). Then recover them into the imported_modules field of the DSOModule. (see https://github.com/apache/incubator-tvm/blob/master/src/runtime/module_util.cc#L38)
Generalize the Mechanism for Graph runtime and VM.
The above mechanism is great for packaging a TVM generated module that imports multiple device moduels. We can serialize arbitary modules(CUDA, or user defined ones) into a single DLL without worrying about how load them separately and link them together. However, when it comes to deploying graph runtime(which is also a Module!), we take a different approach.
model = relay.frontend.from_xyz() # the corresponding library is already part of the graph_rt_mod graph_json, lib, params = relay.build() lib.export_library("xyz.so") exec_lib = tvm.module.load("xyz.so") exec_graph_mod = graph_runtime.create(graph_json, lib)
In the above code, the constructor of graph runtime takes two arguments: json(meta data) and a libary(supported low level functions). We pass these two arguments to
graph_runtime.create in order to get a graph runtime executor. This convention works great for us so far. However, we never-the-less treated graph runtime differently from other runtime modules. As we started to introduce new NN runtime such as relay VM, it is hard to keep track of different export conventions.
This RFC proposes to standardize export of all modules, including the graph runtime and vm, into a single shared library. Specifically, we want to support the following new code:
model = relay.frontend.from_xyz() # the corresponding library is already part of the graph_rt_mod compiled_graph_mod, params = relay.build() # both works compiled_graph_mod.export_library("xyz.so") compiled_graph_mod.export_library("xyz.tar") # load the graph runtime. exec_graph_mod = tvm.module.load("xyz.so")
The import relation of the compiled_graph_mod is shown as follows:
compiled_graph_mod:imported_modules - llvm_mod:imported_modules - cuda_mod - opencl_mod
Instead of passing back a separate lib, we simply store the dependent library in the imported_modules field of the graph runtime.
Enhanced Serialization Convention
We only have to do a bit enhancement to the original serialization convention to support the above code. We can still use SaveToBinary to serialize compiled_graph_mod. However, when we read it back, we need to return the compiled_graph_mod as the primary module and put the dso module under its imported list.
In order to do so, we need to also serialize the import hierachy of the module. The new serialization blob format can be defined in a backward compatible way by appending a key “_import_tree” in the end and stores the import relation accordingly. We will fallback to the old behavior if “_import_tree” is not available.
We should be able to serialize arbitary set of modules in this way. However, we can only have one dso module(e.g. the source module) in the hierachy due to the restriction of a single dll(which works for our current usecases).
In this RFC, we described a way to unify all the runtime module exports. We believe such standardization will bring long term benefits, especially as we start to introduce new NN runtimes such as VM.
This RFC does not however, advocate to use shared library for all module exports. In some cases it is still helpful to separate the generate library from the meta-data. We will still keep the
graph_runtime.create interface for those use cases, and allow user to call
graph_mod.save_graph_json(“xyz.json”) to save the compiled result.