Thanks for the quick response! Here’s the my core problem: I am trying to avoid writing a ton of patterns for the same basic computation.
As an example, I am using the HuggingFace Transformer exported to ONNX. The start of the transformer, from the ONNX perspective, looks like MatMul -> Add
. However, after importing to TVM, the ONNX frontend does a bunch of data mutation. This is to account for broadcasting and the fact that TVM does matrix multiplication as (m,k) x (n,k)
, where ONNX does matrix multiplication as (m,k) x (k,n)
. This means that the Relay expression becomes Reshape -> Reshape -> Transpose -> MatMul -> Reshape -> Add
.
Different frontends may handle the reshape / transposes differently, but they will both do the core computation of MatMul -> Add
. To avoid writing a ton of patterns, I would like an option to always consider these reshape and transpose operators as a match and just skip over them. In this case, the core MatMul -> Add
pattern will always match, and any operators in between will be merged into the composite function.
A custom transformer implementation, such as Nvidia’s FasterTransformer, won’t care about these reshapes anyway.