How to get a "general" shape-size matmul kernel


Dear all,

In Alibaba’s match-matmul optimize work , author said:

“Meanwhile, a general batch matmul kernel suitable for most of the shapes will also be generated to provide a fall-back machanism for the shapes which does not have a corresponding ahead-of-time generated kernel.”

But based on the TVM tutorial, I can only generate kernels with fixed-size shape.

How can I build a “general batch matmul” which is claimed in this work?



Can I use TOPI for this task?


It seems I should use variable. based on this discussion

If I’m correct or wrong, please let me know…


You can use a variable, but that does not address the real issue which is that autotuning only tunes for concrete shapes. It is likely that you can get good coverage for a large number of shapes even when only tuning for a few concrete shapes, but there will likely be corner cases.

Another way of providing general support for many shapes is to fix the splits of loop axes and to handle the uneven edges separately. Imagine you have a loop that is split by a factor of 128, but the current shape is not a multiple of this factor e.g., 1000. One approach is to first do the 896 iterations according to the previous split, and then to handle the remaining 104 iterations separately. However, we currently do not have perfect support for this use case.

In practice, we find that single shape kernels offer the highest performance due to the high degree of specialization and that they are suitable for common use cases (e.g., NN inference).


Thank you very much for your reply, that helps a lot.


The cuda kernel generated by tvm for fixed size is difficult to read. We have tried to modify the fixed size kernel for general use manually, but finally we gave up. Is that possible to generate kernel for variable shape with a fixed size shape as hint ? That will greatly help us who want to use tvm for some variable input application