Complex Pattern Matching in Relay

With the new external codegen feature, I would like to try to plug in more complex operators such as Nvidia’s FasterTransformer.

This would require doing some relatively complex pattern matching in Relay to figure out which operators to replace with the faster transformer. Has anyone thought about doing this? Do you have any suggestions as to how to approach it?

cc @comaniac @zhiics

Thanks!

You might want to take a look at this PR we just merged this morning by @matt-arm.

Wow, perfect timing! Thanks a lot :slight_smile:

@matt-arm, I am currently working on writing a pattern for a transformer. I am realizing that transformer implementations may vary even though they all do the same core computation. For example, depending on the original framework the transformer is imported from, there may be extra nodes for reshape, identity, transpose, etc…

In this case, what we really care about is the core computation, not data mutation that may vary by implementation. How would you feel about adding an extra field to the parameter table, like “ops to ignore”? When we see one of these ops, we consider it a match and just move on.

Could you elaborate more about “ops to ignore”? From my understanding as long as your pattern doesn’t cover an op then it will go through the original TVM flow instead of an external codegen, so this feature seems not necessary.

On the other hand, if you want a subgraph pattern to exclude some ops (something like conv2d -> reshape (ignore) -> ReLU?), then the design philosophy is specifying two patterns. After all, in this case we will have to generate two subgraphs anyhow.

Thanks for the quick response! Here’s the my core problem: I am trying to avoid writing a ton of patterns for the same basic computation.

As an example, I am using the HuggingFace Transformer exported to ONNX. The start of the transformer, from the ONNX perspective, looks like MatMul -> Add. However, after importing to TVM, the ONNX frontend does a bunch of data mutation. This is to account for broadcasting and the fact that TVM does matrix multiplication as (m,k) x (n,k), where ONNX does matrix multiplication as (m,k) x (k,n). This means that the Relay expression becomes Reshape -> Reshape -> Transpose -> MatMul -> Reshape -> Add.

Different frontends may handle the reshape / transposes differently, but they will both do the core computation of MatMul -> Add. To avoid writing a ton of patterns, I would like an option to always consider these reshape and transpose operators as a match and just skip over them. In this case, the core MatMul -> Add pattern will always match, and any operators in between will be merged into the composite function.

A custom transformer implementation, such as Nvidia’s FasterTransformer, won’t care about these reshapes anyway.

Oh I see, so your “ignore” means always match. Thanks for pointing out this issue. I think this issue is pretty common and we should resolve it anyway, although I can imagine this is not an easy problem to solve.

A support to a “wildcard” or “always-match” mechanism implies we can expand one user-pattern to an infinite number of patterns. This will definitely increase the difficulty of pattern matching. We should have an RFC to discuss all possibilities for proposed solutions.

cc @zhiics

Cool, thanks a lot :slight_smile: I create an RFC here.