[RFC] Op based annotation for external codegen

  • Will we allow composite functions to be treated as ‘ops’ for the annotation? This is necessary for the case where a series of operators are supported but the operators on their own are not.

The composite functions will not be treated as ops for op-based annotation, because all ops have to be registered in advance. We may implement op-based and function-based annotations in the same pass, and let function-based part deal with composite functions.

  • I’m interested to know the use case for custom annotation at this point. If we got rid of it, we could just directly call the check function in the graph partitioner which would be a lot simpler.

We will keep custom annotation as an option for developers to make sure the annotation mechanism is capable of covering all possibilities.

  • Do you think we could perform all the heterogeneous partitioning in this pass (e.g., GPU/CPU offload)?

I think that’s doable. @zhiics do you have more to comment?

I also wonder whether it might be appropriate to split the lowering into two phases. The first phase will just lower the graph in a way that conserves information and this is where you can insert your annotation passes. The second phase will be after partitioning where backend specific lowering is done depending on where the subgraphs were offloaded. All of the passes that destroy information can be moved after partitioning (like the ‘combine_parallel’ passes).

Although this proposal seems reasonable to me, I’ll pass this to zhiics as he worked on the pass manager.

One interesting thing we can see here is that we start to introduce op-related attributes for specific pass(AnnotateCompiler).

It would be great to discuss the potential API design and namespacing.

For example, given that the AnnotateCompiler pass is special to the “dnnl”, it would be useful to keep some of the attributes local to dnnl.py and only makes use of set_attr and get_attr in op as a generic API.

So the current proposal seems not too different from the original implementation that was removed in the annotation PR, is that right?

I also want to hear more about how this will play with composite functions. Op based or function based, composite functions need to be treated the same way as any other op during the annotation pass (That was the original motivation). So if the function based pass is supposed to deal with composite, it needs to deal with other regular ops like the op based one would do, so we might as well implement only the function based pass.

+1 on having the pass also deal with composite functions - having a unified way to offload both composite functions and single ops to external runtimes would be hugely helpful!

1 Like

@masahi @jonso Yes, we can also handle composite function in the AnnotateCompiler pass as we can check if the callnoode’s op is a composite function or not.

@mbaret It is possible but it needs more thoughts as CPU/GPU heterogeneous execution is currently taken care of by TVM codegen and runtime which is part of TVM build pipeline.

@mbaret We can separate the passes into two stages. It needs some refactoring.

@tqchen Could you please clarify a bit more about the set_attr and get_attr stuff? I think I don’t fully understand. Thank you.

I think what we really need is a more principle approach to define composite ops and register them as regular ops to Relay. After having these composite ops, we can then register attributes to them, such as whether to expand them to elementary ops and whether to use external codegen or external library. This can solve many problems. For example, softmax issue can also be solved using this approach. And of course, we can add customized composite ops/rules for different backends.

The API _register_external_op_helper was quite specific to the AnnotateCompiler. I wonder if we can reduce to use less amount of abstraction, and instead use https://docs.tvm.ai/api/python/relay/op.html?highlight=op#tvm.relay.op.register directly with a clear attr name

@haichen Yes, that’s probably the cleanest approach I can think of as well. Just talked to @comaniac with the same idea.

@tqchen Thanks. Let’s take a look first.

One thing I’m interested in is how similar this will end up looking to the device annotation mechanism that’s already present. The external codegens are going to mostly be tightly coupled to a particular device. In that case this looks to be an extension of the device annotation. For example, we might offload to GPU using ACL/TVM/TensorRT or to CPU using DNNL/ACL/TVM or to an accelerator using some custom compiler.

1 Like

This is a fair consideration to me, but we need more thoughts and refactoring as @zhiics mentioned. Specifically, TVM now supports 3 kinds of build pipeline:

  1. TVM codegen (codegen/build stage): This includes LLVM, CUDA, etc. The CPU/GPU heterogeneous execution is also handled in this build process.

  2. Third-party libraries (lowering stage): This includes CBLAS, CUBLAS, CuDNN, etc. Developers manually map Relay ops to corresponding library functions. The mapping is implemented in contrib and will be triggered when lowering.

  3. BYOC (Relay stage).

As can be seen, those 3 pipelines happen at different stages, so we cannot determine the optimal device offloading policy easily. We need to first plan to put all mechanisms to the same stage so that we can work on this issue.

2 Likes

If it looks to be a significant refactoring effort, then maybe we can consider it out-of-scope for this RFC. But I think we should ensure we arrive at a design that is flexible enough that we keep the option of unifying the 3 pipelines open in the future.

As another quick point relating to API, this is what I currently have to call to get the partitioning to happen:

f = relay.build_module.bind_params_by_name(mod["main"], params)
mod = tvm.IRModule()
mod["main"] = f
pattern_table = get_pattern_table(external_compiler)
mod = relay.transform.MergeComposite(pattern_table)(mod)
mod = relay.transform.AnnotateCompiler(external_compiler)(mod)
mod = relay.transform.PartitionGraph()(mod)

I think this unnecessarily exposes a lot of the implementation to the API when all the user really cares about is that their external codegen is used as part of the partitioning. Maybe we can move all of these passes inside the build function and provide a relatively clean API along the lines of relay.build_module(mod, params, external_codegens=["acl", "dnnl"])?

Thanks everyone for such valuable discussion. Accordingly, we came up with several proposals of API designs for op-based annotation. We will have seperate discussions for other issues related to BYOC but not directly to the op-based annotation.

Annotator Implementation

Annotator will be implemented by developers/vendors under python/tvm/relay/op/contrib/<compiler_name>/external_compiler.py. To simplify, we use dnnl as an example compiler name in the rest of this post.

A.1: A New Register Helper.

This is the original proposal. We ask developers to implement a checker for each op to indicate if we should annotate that op or not:

def conv2d(attrs, args):
    return True

If the checker simply needs to return True or False, we provide a helper to reduce the efforts:

_register_external_op_helper("conv2d")

These two implementations are fully equivalent and developers can use either way they like.

A.2 Use tvm.relay.op.register Directly

Suggested by @tqchen, we can reuse the Relay op register instead of introducing a new register:

from ... import op as reg

reg.register("nn.conv2d", "FTVMExternalCompiler", lambda r, g, c: c == "dnnl")

@reg.register("nn.batch_norm", "FTVMExternalCompiler")
def batch_norm(attrs, args, compiler):
    if compiler != "dnnl":
        return False
    return # check with attrs and args

The most important benefit of this approach is that we do not introduce any new APIs. On the other hand, developers have to write one function per op. Of course, we can still add the following practice to the tutorial:

def _reg_op(op_name, supported=True):
    @reg.register(op_name, "FTVMExternalCompiler")
    def _func_wrapper(attrs, args, compiler):
        return supported if compiler == "dnnl" else False
    return _func_wrapper

_reg_op("nn.conv2d")

End-User Interface

For end-users who know nothing about annotation and external codegen, we have the following options to put it all together:

E.1: A separate API

The first approach asks users to explicitly call a special API we provided to perform graph partitioning:

mod = tvm.IRModule()
mod["main"] = # A Relay function
mod = relay.build_extern(mod, external_compiler="dnnl", patterns=[pattern1, pattern2, ...])
relay.build_module(mod, params)

where build_extern is an API calling MergeComposite (if patterns is not empty), AnnotateCompiler, and PartitionGraph sequentially.

The advantages of E.1 are high flexibility and extensibility. The drawback is an explicit function call. Note that since some passes such as constent folding and QNN legalization should be called before our passes, we will have a follow-up RFC discussing how to separate them from build_module if E.1 gets in.

If you vote for this approach, please also vote for the names of each API and argument. We provide some candidates for each of them.

  • build_extern
    • partition_module
    • extern_preprocess
  • external_compiler:
    • ext_compiler
    • compiler
  • patterns:
    • composite_patterns
    • fuse_patterns

E.2 Integrate to build_module

Another approach suggested by @mbaret is integrating those passes to the current TVM build process. Specifically, users only need to write one line of code:

relay.build_module(mod, params, external_compiler, patterns)

The advantage of E.2 might be a simpler programming model. The drawback is that it needs more changes in the TVM build module and compile engine. Again, if you vote for this approach, please also vote for the names of each API and argument.

Please vote for both A and E approaches and the names of API and arguments. You are also welcome to and share your thoughts. Thanks :slight_smile:

def _reg_op(op_name, supported=True):
    @reg.register(op_name, "FTVMExternalCompiler")
    def _func_wrapper(attrs, args, compiler):
        return supported if compiler == "dnnl" else False
    return _func_wrapper

_reg_op("nn.conv2d")

is quite clean. I am also okay with reg.register("nn.conv2d", "FTVMExternalCompiler", lambda r, g, c: c == "dnnl"), but it needs a bit more boilerplates.

For placeholder1, I prefer build_extern a bit more, but this is actually not build. It is more about some preprocessing of graph.

For placeholder2 and placeholder3, I would vote for compiler and fuse_pattern, respectively. We may want to avoid directly using pattern as Relay has Pattern and PatternNode already.

Regarding A, I think it will be quite valuable to have composite functions and operators registered for annotation via the same (or a very similar) API, even if under-the-hood we use a different mechanism to support them. This to me makes an A.1 style approach preferable as there’s more flexibility to have it behave differently for functions vs operators.

For E.1, I’d just add the drawback that it implies all partitioning takes places on an unlowered Relay graph. I don’t think this is generally true (at least it hasn’t been for me), particularly when considering QNN operators.

Even we adopt A.2 we can still support composite function registration by treating a composite function as a custom op. Since, the A.1 implementation also makes use of relay.op.register, so these two approaches basically have the same mechanism.

For the problem you mentioned in E.1, correct me if I misunderstood anything, you can still run any pass before the external build pipeline, so E.1 is just a handy API for end-users to invoke external compiler flow.

To give an example, for an external codegen that needs params to be bound + qnn to be lowered (which is a pretty believable combination from my experience), the user would have to write:

f = relay.build_module.bind_params_by_name(mod["main"], params)
mod = tvm.IRModule()
mod["main"] = f
mod = relay.transform.QnnCanonicalize()(mod)
mod = relay.transform.QnnLegalize()(mod)
mod = relay.placeholder1(mod, placeholder2="X", placeholder3=[pattern1, pattern2, ...])

I think this puts an unnecessary burden on the user to understand the details of both the codegen and the Relay lowering passes. This seems like something that should be defined somewhere when the codegen is initially integrated, not logic that’s pushed up into the API.

OK I see. I should clarify E.1 more. If E.1 is our final decision, we will have a follow-up RFC to discuss how should we separate passes in build_module to two parts (platform independent and platform dependent) so that we could have a more flexible extensibility.

I’m a bit confused at placeholder1, placeholder2, placeholder3. Are they real placeholder that later you will replace it by some formal name?

Yes they are not placeholders for Tensors but literally “placeholders” for the API and argument names we will put based on this discussion.

I suggest you replace these “placeholders” by some temporary names that have real meanings. It’ll help people understand the proposal and choose which one is better.