Thanks everyone for such valuable discussion. Accordingly, we came up with several proposals of API designs for op-based annotation. We will have seperate discussions for other issues related to BYOC but not directly to the op-based annotation.
Annotator Implementation
Annotator will be implemented by developers/vendors under python/tvm/relay/op/contrib/<compiler_name>/external_compiler.py
. To simplify, we use dnnl
as an example compiler name in the rest of this post.
A.1: A New Register Helper.
This is the original proposal. We ask developers to implement a checker for each op to indicate if we should annotate that op or not:
def conv2d(attrs, args):
return True
If the checker simply needs to return True
or False
, we provide a helper to reduce the efforts:
_register_external_op_helper("conv2d")
These two implementations are fully equivalent and developers can use either way they like.
A.2 Use tvm.relay.op.register
Directly
Suggested by @tqchen, we can reuse the Relay op register instead of introducing a new register:
from ... import op as reg
reg.register("nn.conv2d", "FTVMExternalCompiler", lambda r, g, c: c == "dnnl")
@reg.register("nn.batch_norm", "FTVMExternalCompiler")
def batch_norm(attrs, args, compiler):
if compiler != "dnnl":
return False
return # check with attrs and args
The most important benefit of this approach is that we do not introduce any new APIs. On the other hand, developers have to write one function per op. Of course, we can still add the following practice to the tutorial:
def _reg_op(op_name, supported=True):
@reg.register(op_name, "FTVMExternalCompiler")
def _func_wrapper(attrs, args, compiler):
return supported if compiler == "dnnl" else False
return _func_wrapper
_reg_op("nn.conv2d")
End-User Interface
For end-users who know nothing about annotation and external codegen, we have the following options to put it all together:
E.1: A separate API
The first approach asks users to explicitly call a special API we provided to perform graph partitioning:
mod = tvm.IRModule()
mod["main"] = # A Relay function
mod = relay.build_extern(mod, external_compiler="dnnl", patterns=[pattern1, pattern2, ...])
relay.build_module(mod, params)
where build_extern
is an API calling MergeComposite
(if patterns
is not empty), AnnotateCompiler
, and PartitionGraph
sequentially.
The advantages of E.1 are high flexibility and extensibility. The drawback is an explicit function call. Note that since some passes such as constent folding and QNN legalization should be called before our passes, we will have a follow-up RFC discussing how to separate them from build_module
if E.1 gets in.
If you vote for this approach, please also vote for the names of each API and argument. We provide some candidates for each of them.
- build_extern
- partition_module
- extern_preprocess
- external_compiler:
- patterns:
- composite_patterns
- fuse_patterns
E.2 Integrate to build_module
Another approach suggested by @mbaret is integrating those passes to the current TVM build process. Specifically, users only need to write one line of code:
relay.build_module(mod, params, external_compiler, patterns)
The advantage of E.2 might be a simpler programming model. The drawback is that it needs more changes in the TVM build module and compile engine. Again, if you vote for this approach, please also vote for the names of each API and argument.
Please vote for both A and E approaches and the names of API and arguments. You are also welcome to and share your thoughts. Thanks