These days I am working on some tensorization stuff, and I found several things that makes the current
tensorize interface not sufficient.
First, the tensorization declaration interface requires an TVM op. Originally I suppose it serves the purpose of software emulation (when the underlying hardware has no corresponding intrin support, we can use this Op to replace this code segment at least guarantee the correctness).
However, after using this interface, I realize the true purpose of this parameter is to indicate the shape of input/output data, and what we do in the Op actually does not matter. This is a little bit counter intuitive for developers I suppose. Can we just have an OpaqueOp, that only accepts input shapes and output shapes, and does nothing?
Second, another thing I notice is that tensorization is essentially a “primitive sugar” or “code transformation sugar” which offloads IRs under certain loop level. This interface is not aware of if this loop body is perfect tiled or not. Thus, this primitive cannot be applied when imperfect loop tiling.
I am curious if we can work around these two issues?