[RFC][Relay] Program Matching for Relay Pt 1: A pattern language

jroesch · February 28, 2020, 8:07pm

A pattern language for Relay

This RFC introduces a pattern language for describing topology of Relay graphs and matching them. The RFC is structured as follows, motivation, examples, design, and then a small mostly complete prototype with test cases.

Motivation

There are many places in TVM where we identify pure data-flow sub-graphs of the Relay program and attempt to transform them in some way example passes include fusion, quantization, external code generation, and device specific optimizations such as bitpacking, and layer slicing used by VTA.

Many of these passes today require a lots of boring boilerplate code in order to implement as well as requiring users to think in terms of visitors and AST matching. Many of these transformations can easily be described in terms of graph rewrites. In order to build a rewriter or other advanced machinery we first need a language of patterns to describe what we can match.

Such a language is not just useful for building a rewriter but also providing extension points for existing passes. For example the fusion pass could be parametrized by a set of fusion patterns which describes the capability of your hardware, and the quantization pass could take a set of patterns which describe which operators can be quantized on a given platform.

In the backend world, we could use the same machinery to build a higher level API using bring your own code generation. This API takes set of patterns describing your hardware capabilities and an external compiler, providing a relatively smooth heterogeneous experience out of the box.

Recently there has been lots of discussion on similar issues in the community, and we wanted to gather feedback and hopefully collaborate on a design that can benefit everyone working in this space. This RFC focuses on the pattern language with future applications to come later.

Examples

There are quite a few properties that are worth matching of operators below we examine how to match tree properties, and expand on some use cases that are not fully explored in the prototype. The first example is a simple case where we want to match one operator with a single input OR another operator with a single input, see the below diagram for a graphical representation and corresponding code.

Canvas_1

    def test_match_op_or():
        is_add_or_sub = is_op('add') | is_op('subtract')
        assert is_add_or_sub.match(relay.op.op.get("add"))
        assert is_add_or_sub.match(relay.op.op.get("subtract"))

The next example is a dense operation with any operator that is marked element-wise.

Canvas_2

    def test_no_match_attr():
        op = is_op('nn.dense').has_attr("TOpPattern", K_ELEMWISE)
        op_pat = op(wildcard(), wildcard())
        x = relay.var('x')
        y = relay.var('y')
        assert not op_pat.match(relay.op.nn.dense(x, y))

The next example is matching a diamond with two inputs at the top of the diamond.

    def test_match_diamond():
        # Pattern
        is_conv2d = is_op('nn.conv2d')(is_input(), is_input())
        path1 = is_op('nn.relu')(is_conv2d)
        path2 = is_op('nn.leaky_relu')(is_conv2d)
        diamond = is_op('add')(path1, path2)

        # Expr
        inp = relay.var('input')
        weight = relay.var('weight')
        conv2d = relay.op.nn.conv2d(inp, weight)
        relu = relay.op.nn.relu(conv2d)
        leaky_relu = relay.op.nn.leaky_relu(conv2d, alpha=0)
        out = relu + leaky_relu

        # Check
        assert diamond.match(out)

The final example we would like to match which is not yet implemented in the prototype is matching diamonds with a post-dominator relationship. Our plan is to embed dominator analysis as type of matching in the pattern language in order to allow for pattern matching with unknown topology. This is important because we want to able to use the language to describe fuse patterns, like elementwise operations followed by a conv2d.

    def test_match_dom_diamond():
        # Pattern
        is_conv2d = is_op('nn.conv2d')(is_input(), is_input())
        reduction = is_op('add')(wildcard(), wildcard())
    		diamond = dominates(is_conv2d, is_elemwise, reduction)

        # Expr
        inp = relay.var('input')
        weight = relay.var('weight')
        conv2d = relay.op.nn.conv2d(inp, weight)
        relu = relay.op.nn.relu(conv2d)
        leaky_relu = relay.op.nn.leaky_relu(conv2d, alpha=0)
        out = relu + leaky_relu

        # Check
        assert diamond.match(out)

Design

The pattern language proposed is designed to be a mirror of Relay’s IR with additional support for common scenarios. The goal of the pattern language is to provide a regular-expression like capability for matching data-flow graphs and doing rewriting.

The high level design is to introduce a language of patterns for now we propose the language as:

Pattern ::= expr
        | *
        | pattern(pattern1, ... patternN)
        | has_type(pattern, type)
        | has_attr(pattern, attr, attr_value)
        | is_input
        | named(name, pattern)
        | pattern1 `|` pattern2
        | dominates(parent_pattern, path_pattern, child_pattern)

The above language then provides a matching interface with both can select sub-graphs as well as verify that the graph does match the pattern.

Expression Pattern

Match a literal expression.

Wildcard

Match any expression.

Type Pattern

Check that the expression matched by the nested pattern has a particular type.

Attribute Pattern

Check that the operator matched by the pattern has an attribute with a particular value.

Input

Check that the expression is an input, i.e has no parents and is a variable.

Named

Name a pattern group for later naming the sub-group.

Alternation

Either match the first pattern or the second pattern.

Domination

Match the parent pattern for the route node and then check that the child pattern holds for each child along the domination path.

Above I enumerate the current working design and types of patterns. I have built an initial prototype that implements most of the patterns but is lacking named patterns and dominator patterns. Matthew Brookhart (now at OctoML) is spearheading the implementation of this and we will hopefully contribute a work-in-progress implementation soon.

Extensions

We can grow the above language in a variety of ways including multi-outputs, for example we can match finite multiple outputs using a design like below.

    def test_multi_output():
        # Pattern
        is_2outputs = is_op('2outputs')(is_input(), is_input())
    	  is_output1 = is_2outputs[0]
    		is_output2 = is_2outputs[1]
    		named

        # Expr
        inp = relay.var('input')
        weight = relay.var('weight')
        result = 2outputs(inp, weight)
        output1 = result[0]
        output2 = result[1]

        # Check
        assert is_output1.match(output1)
    	assert is_output2.match(output2)

Reference Prototype

For a working version of the pattern language see this GitHub Gist https://gist.github.com/jroesch/de9648556b1f7e1bb3db490e0f708813.

tqchen · February 28, 2020, 7:34pm

cc @mbaret @jwfromm @jonso @comaniac @zhiics @yzhliu @haichen @masahi

tqchen · February 28, 2020, 7:42pm

I think it is also good to get back to our vision about the high level graph optimizations. There are a few aspect of high level graph transformations (quantization, fusion, fission, partition)

A0: Capability description – HW backend A support conv2d-relu op
A1: Validation – Given a description, validate that a sub-function can be supported(via pattern matching)
A2: Objective – Run the transformation such that it gives us the best perf, minimum external codegen usage etc.
A3: Transformation strategy(optimization algorithm) – run a search, a greedy algorithm, or a ML guided algorithm to find the best transformation.

It is important to separate these concerns, and bring a unified base infra that can handle all the graph transformations.

Ideally, a developer only have to provide A0(via the pattern language) to describe the capability.
Then we can build various infra to take A0 as input, and provide capabilities in terms of A1, A2, and A3 to run the final transformation.

Having an unified graph pattern language is an important step towards that and can streamline related effort such as MergeComposite, extern codegen and graph partition in a unified way

tqchen · February 28, 2020, 8:28pm

A main highlight of the proposed pattern languages is the dominator pattern(it is actually wrt to the immediate dominator) to support non-fixed graph patterns(such as conv2d followed by elemwise ops), which is necessary for fusion.

In the case of the linear chain graph, we can just use a regex like patterns. The dominator based pattern seems to be a good generalization of that under DAGs. Because it is something that people have not seen before, it would be great to discuss possible APIs(both pattern and rewrite) to get a good interface for everyone

mbaret · February 28, 2020, 8:28pm

This looks awesome There have been quite a few requests for a more powerful pattern matcher in the MergeComposite pass and this looks like it should address most of those. Just a couple of questions.

Can we support matching a pattern which has an optional op in it? For example, matching conv->requantize and conv->bias_add->requantize using the same expression.
Is it the intention to include some rewriting infrastructure alongside this to make it easier to replace a matched subgraph with another (or turn a subgraph into a function)?

jwfromm · February 28, 2020, 9:15pm

I really like the regex style matching syntax! I think it’d be helpful to have an example showing how a matched subgraph could be used. For example, after matching a diamond in graph, how we would annotate that subgraph to run on a different device than the rest of the graph? Is it worth considering the post-match API in this RFC or would that be part of future work?

manupa-arm · February 28, 2020, 9:33pm

Sounds great! When you say “graph rewrites”, are you proposing a interface to define a “graph grammer” ?

tqchen · February 28, 2020, 9:38pm

I was think similar to to regex rewriting, e.g.

"def\(\s*\"(?P<fn_name>[a-z0-9|_]+)" -> "<fn_name>xyz"

Will allow us to represent the rewrite that extracts the function name after def and append xyz to it. We can brainstorm possible rewrites that we might do for graphs(the concrete DAG->DAG pattern is fine and we know how to do that, just wonder if there is a need for wildcard like patterns like the above)

lhutton1 · February 28, 2020, 9:36pm

I can see this being a very useful feature, it looks great! One question I had was will we also be able to specify certain properties for an op to match against? E.g. for a convolution op: data layout = NHWC, groups = 1

manupa-arm · February 28, 2020, 10:00pm

I guess there is a need for optional ops – as @mbaret pointed out (that could be handled by “?” pattern, I guess ) and a list of ops like [abc] in regex. Not sure about a full wildcard though.

manupa-arm · February 28, 2020, 10:19pm

In the multi output scenario, we need to make sure that any output is not an ancestor to any input of the same pattern through connections that are outside the pattern. Right? If the goal is targeting offloading to backends OR at least provide an interface to specify to avoid such matches ? (as discussed in [Relay] Improved graph partitioning algorithm)

tqchen · February 28, 2020, 10:43pm

What you are describing belongs to A3 – the algorithm e.g. graph partition need to ensure that the there is no overlapping patterns in the partitioned graph.

Of course, it would be interesting to ask whether the above line itself is a pattern(for a bigger graph) which could be overkill for now(then it comes back to A0)

masahi · March 1, 2020, 10:44am

Torch also has a mechanism for subgraph matching and rewrite. In their case, the pattern and replacement are both textual representation of their IR.

For example, their pass for automatically transforming fp32 graphs to quantized ones are based on pat matching. Here is an example of a pattern and its replacement.

github.com

pytorch/pytorch/blob/bbff06ee96f4ce7323f5bb1043b802833362e075/torch/csrc/jit/passes/quantization_patterns.h#L11-L23


  std::string conv2d = R"(
graph(%a_quant, %packed_params, %r_scale, %r_zero_point, %r_dtype, %stride, %padding, %dilation, %groups):
        %a_dequant = aten::dequantize(%a_quant)
        %w_quant : Tensor, %b : Tensor? = quantized::conv2d_unpack(%packed_params)
        %w_dequant = aten::dequantize(%w_quant)
        %r = aten::conv2d(%a_dequant, %w_dequant, %b, %stride, %padding, %dilation, %groups)
        %r_quant = aten::quantize_per_tensor(%r, %r_scale, %r_zero_point, %r_dtype)
        return (%r_quant) )";


  std::string quantized_conv2d = R"(
graph(%a_quant, %packed_params, %r_scale, %r_zero_point, %r_dtype, %stride, %padding, %dilation, %groups):
        %r_quant = quantized::conv2d(%a_quant, %packed_params, %stride, %padding, %dilation, %groups, %r_scale, %r_zero_point)
        return (%r_quant) )";

tqchen · March 1, 2020, 4:51pm

It is indeed interesting to see if we can have a textual repr for pattern that is related to IR. I wonder though how can we go beyond the concrete DAG->DAG pattern to wildcard related, as discussed in the current proposal

jonso · March 1, 2020, 7:53pm

+1, wildcards and generalizations are really important in making this a flexible system.

yzhliu · March 2, 2020, 12:21am

I wonder how practical can the infrastructure be shared with low-level tir?

tqchen · March 10, 2020, 8:36pm

Great discussions so far, it would be great if we can comment on API choices. So far I see the need to have ? pattern.

@jroesch can you incorporate the feedback and give a couple of API design choices, then we can try to conclude and move to implementation(by sending a formal RFC to github)

forrestm-quic · March 10, 2020, 10:20pm

Seconding @yzhliu’s question on sharing infrastructure and code with the low level TIR, especially w.r.t. python bindings. For the Hexagon backend, we want to do more work in the TIR but a lack of good pattern matching utilities in python is hindering our efforts there. A unified pattern matcher for relay and TIR would be a big +1 for us.

tqchen · March 11, 2020, 12:35am

To followup discussions on TIR pattern matching. it would be great to get a sense of potential use-cases

If the task is expression level pattern matching, we currently have a rewriting util that can be used now: https://github.com/apache/incubator-tvm/blob/master/src/arith/pattern_match.h which is used in the arithmetic simplifier
If the match involves match over a multi-level loop(e.g. tensorization) we will need specialized design mechanisms

jonso · April 1, 2020, 5:21pm

@jroesch, is there any update on this feature? I’m really excited to try it out