[RFC][External Codegen] Defining 'Composite' Relay operators

An issue encountered using the external codegen infrastructure is that it’s difficult to express many-to-one relationships between Relay and external ops. For example, a quantized convolution gets lowered to 4 Relay ops by the TFLite frontend:

nn.pad
qnn.conv2d
nn.bias_add
qnn.requantize

However, Arm Compute Library directly supports a quantized convolution as expressed by TFLite, so we need to map these 4 Relay operators to a single ACL operator. That means we need to annotate this group of 4 operators as supported even though taken individually they are not. This is different to fusion for performance reasons, where cascading operators together is more efficient.

One approach to this is to write a custom annotation pass that can detect the sequence of operators and annotate accordingly. However, we end up having to write repetitious logic to detect the sequence again when it comes to codegen. It would be preferable if there was a generic mechanism.

An alternative proposal, and the subject of this RFC, is as follows. Introduce a new pass, MergeComposite, which accepts a dictionary of patterns indexed by name. The pass will find these patterns and wrap them in a Relay function marked ‘Composite’ with the name given in the dictionary (eg. acl.qnn_conv2d). The calls to these composite functions can then be treated as the atomic units of annotation.

As an example, the following excerpt would go from this:

    %48 = nn.pad(%47, pad_width=[[0, 0], [1, 1], [1, 1], [0, 0]]);
    %49 = qnn.conv2d(%48, %acl_input19);
    %50 = nn.bias_add(%49, %acl_input20, axis=3);
    %51 = qnn.requantize(%50);
    %52 = nn.max_pool2d(%51, pool_size=[2, 2], strides=[2, 2], layout="NHWC");

To this:

    %51 = fn (%input10, %weight9, %bias9, Composite="acl.qnn_conv2d") {
      %48 = nn.pad(%input10, pad_width=[[0, 0], [1, 1], [1, 1], [0, 0]]);
      %49 = qnn.conv2d(%48, %weight9);
      %50 = nn.bias_add(%49, %bias9, axis=3);
      qnn.requantize(%50)
    };
    %52 = %51(%47, %acl_input19, %acl_input20);
    %53 = nn.max_pool2d(%52, pool_size=[2, 2], strides=[2, 2], layout="NHWC");

This is desirable as it allows us to extend the capabilities of a generic operator-by-operator annotation pass to handle the composite case. Once you arrive at a call to a function marked ‘Composite’, that function will be of a known form so a simple static traversal can be used. This can be leveraged both to check whether the function is supported and to generate the external code.

I’ve been careful to avoid using the word ‘fuse’ or ‘fusion’ to explicitly distinguish between this case and that of operator fusion. It could be perfectly valid to fuse a ‘Composite’ function into another Relay operator (eg. the composite qnn_conv2d might be fused with an activation).

7 Likes

To me it seems something we can already do using manual annotation and partitioning. Maybe my PR below would be helpful.

It is possible to do it in quite a roundabout way, but what I am discussing here is a generic mechanism that doesn’t rely on ad hoc pattern matching code being dropped into custom annotation and codegen passes. This would be a standalone pass to which any 3rd party codegen could register merging patterns. This way you can still maintain a generic annotation mechanism and don’t need to include code like ‘DetectFusedConv2DBiasReLU’ in your codegen.

To give an example of how I expect this to look from the Python side:

def qnn_conv_pattern():
    input = relay.var('input_woo')
    weight = relay.var('weight')
    bias_weight = relay.var('bias')
    conv = relay.qnn.op.conv2d(
        input,
        relay.const(tvm.ndarray.empty(shape=(1, 1, 1, 1)), 'int32'),
        relay.const(0, 'int32'),
        relay.const(0, 'int32'),
        relay.const(0, 'float32'),
        relay.const(0, 'float32'),
    )
    bias = relay.nn.bias_add(conv, relay.const(tvm.ndarray.empty((1,))))
    req = relay.qnn.op.requantize(
        bias,
        relay.const(0, 'float32'),
        relay.const(0, 'int32'),
        relay.const(0, 'float32'),
        relay.const(0, 'int32'),
    )
    return req

pattern_table = {
    "acl.qnn_conv2d": qnn_conv_pattern()
}
mod = relay.transform.MergeComposite(pattern_table)(mod)

Ah I see. So my PR corresponds to the first approach you mentioned. I also thought that having to write pattern matching in cpp when I already have the same logic in python annotator is not great.

1 Like

@mbaret thanks for the proposal. @zhiics and I are thinking about similar ideas as we totally understand how annoying of writing a sophisticate annotator, but we haven’t found a good way for composite/fusion pattern representation.

For your proposal that specifies a map from a sequence of Relay ops to a single backend op, I have two concerns:

  1. A list of Relay ops can only represents a sequence. In general we wish this representation can cover all kinds of patterns (e.g., branches).

  2. It might be better to extend the current function attribute compiler to indicate the backend op information instead of introducing a new composite attribute, but this point can be discussed more.

Just to address those concerns:

  1. I absolutely agree a sequence is insufficient. My proposal is actually for the patterns to be defined as Relay expressions with free vars. You can see the general idea if you look at the python snippet I posted as a reply. I’ve got a WIP implementation of this that I’ll try and get put up as a PR later today.

  2. I don’t like having to add a new attribute either, but I’m not sure how any of the existing ones can be sensibly extended. ‘Compiler’ already has to hold the information on the codegen, so appending a composite function name to this would, I think, be very confusing.

I’d also like to emphasize that this is not just useful in the annotator, but also in the codegen pass. Without a separate pass to perform the merging in some way, I don’t see how you can avoid having to write custom pattern matching code in both the annotator and the codegen pass.

For 1, I got your point of specifying a pattern to be a single subgraph as well as a backend op. I would summarize this approach to one sentence: asking developers to write a small Relay program as a pattern, and map the pattern to a backend op. While this approach is definitely general, it requires developers to fully understand how a Relay graph looks like. In your quantization example, for instance, developers have to first know that a quantized conv2d will be lowered to 4 Relay ops. We might need a more straightforward representation. However, I still look forward to seeing your PR of this approach. We can have more specific discussions with the community around the PR.

For 2, yeah I know what you meant, and I don’t have a better idea either for now so let’s keep yours first.

In addition, I completely agree that we need a pass to perform pattern merging. As we have mentioned in an early topic, our final goal is to achieve the following flow:

op-level annotation (by developers)
                    |          
merge subgraphs (partial automation with user-given pattern hints or fully automation)
                    |
partition graph to external functions (automation, done)
                    |
             backend codegen

As you can see, the 3rd and 4th steps are already merged, and we are now working on a PR for the first step (naming and interface). Accordingly, this RFC greatly fits our goal, so I just considering if we should create a separate pass for merging subgraphs or we should extend the existing partitioning pass to support graph merging. Actually, the partition pass already has some sorts of merging logics (very limited tho).

In short, I’m glad to see this RFC and I’d love to see opinions from others, since we want to be as general as possible but avoid overkilling designs.

The PR is now up https://github.com/apache/incubator-tvm/pull/4771. It’s still partially WIP.

@mbaret, @comaniac I have another feature idea around this that I would like to get your opinions on:

I am still working on matching transformers. So far, I have written a couple of patterns that match different published transformer implementations. The computation is exactly the same, but the ops are ordered differently (this is because transformers do many computations in parallel, so the ordering is arbitrary).

Because of this, MergeComposite run on each implementation generates functions which actually have the same free vars (the transformer weights), but the ordering is different. This makes plugging in the external codegen really annoying. I basically have to re-define the same function with the arguments in different orders.

I would like to pass an optional argument into MergeComposite to declare an ordering of free vars. This would be an ordered list of std::string. Before creating the final function, the function params would be ordered like the list using their name_hint().

This would allow me to write one function in the external codegen and have it always match many transformer patterns.

What do you think?

I’m hesitant to add too much to the API if it can be avoided. Is it possible we could just define a standard ordering on the free vars (maybe alphabetical) and have that enforce the order for you?

That’s fine with me too, as long as it’s consistent. Mind if I send out a quick PR?

Go ahead :slight_smile:

I don’t think I am clear about the ordering issue, but I’ll refer to the unit test in the PR directly.

PR is here: https://github.com/apache/incubator-tvm/pull/4939