Implement Design of Standardize GraphRuntime Exports into a Single DLL

Background RFC: Standardize GraphRuntime Exports into a Single DLL

As background RFC said, we want to standardize GraphRuntime exports into a single dll, according to the RFC specification and thanks for the discussion of @tqchen, I have completed one prototype of it. So I think it is the time to open the RFC to share the implement design and discuss it with the community.

Firstly, we want to accomplish two goals:

  1. Keep the old compatibility.

    • API compatibility. like graphruntime.create(json, lib, ctx) could work.
    • Binary compatibility. That is to say, previous exported dll could work in new runtime system, no matter it is just one single CPU shared library or GPU shared library (means LLVM <- CUDA Mod / OpenCL Mod).
  2. Make the GraphRuntime and future runtime could export into a single dll.

To accomplish these two goals, we should support these two kinds of hierarchy:

DSO Module: imported modules
    - CUDA Modules
    - OpenCL Modules
    - ...
X Module: imported modules
  - DSO Module: imported modules
    - CUDA Modules
    - OpenCL Modules
    - ...

Here, DSO Module contains LLVM module / C source module, X Modules could be GraphRuntime module, VM module or else. Here we don’t restrict we only support two levels import, i.e. you could construct modules like this:

X Module: imported modules
  - DSO Module: imported modules
    - CUDA Modules
    - OpenCL Modules
    - W Module: imported modules
       - ...
  - Y Module
  - Z Module: imported modules
    - ...

So, to support it, we should firstly change a little of def export_library, which only supports DSO module exports, now we should support more module export.

Enhanced Serialization Convention

To distinguish old / new behaviour, we should add one _import_tree attribute as RFC Standardize GraphRuntime Exports into a Single DLL said. And to keep the compatibility, we shouldn’t break the blob layout, i.e. to say we should keep the blob layout like previous:

blob_size
type_key
Logic
 ...

So if we want to add _import_tree to serialize, we must append it to the end, like this:

blob_size
type_key
Logic
...
_import_tree
_import_tree_data_structure

Here, one import thing is _import_tree_data_structure, what should we designed ? As we load it back, our logic is like we do in serialization, but we don’t have the import relationship existed in the serialization part when we load it back. So we should design one data structures when we meet this module, we could find its parents (As this module maybe imported by multi parents). So in this RFC’s design, proposed it be: std::unordered_map<uint64_t, std::vector<uint64_t>> import_tree, the key is module index, the value is parents’ module index. The module index will be generated automatically and the logic very simple. i.e. we will do one DFS and when we meet one new module, we will give it one index and record the relationship between it and its parents using adjacency list. After this, we could construct the import_tree very easily, just loop the adjacency list and record it. You could see more detail in ModuleGraph::InitEdge and ModuleGraph::ConstructImportTree .

During serialization part, we have one new key _lib except _import_tree. _lib is for DSO module and the module referring to the library symbol. You could see previous pr of tqchen: https://github.com/apache/incubator-tvm/pull/4481. If we have it, we could also get the module order as they appear (dso module maybe not the first module). Note, we could keep the compatibility of previous dll, as our load logic is like this:

if (tkey == "_lib")
   ...
else if (tkey == "_import_tree")
  ...
else
  ...

because previous dll doesn’t have _lib and _import_tree, so it only could enter the logic of else. The _lib will not affect it.

Finally, when will not produce _import_tree in this RFC design? Only one situation: We only one dso module and doesn’t contain anything else. So if you have module like this: Graphruntime <- LLVM, we will produce. If it is LLVM <- CUDA, we will still produce.

After this RFC, we should make our future runtime (GraphRuntime, VM and so on) export easier. After we agree it, I will do the GraphRuntime export as one show example. Welcome to discuss.

WIP code of implementation:https://github.com/apache/incubator-tvm/pull/4532

@tqchen