Loop partitioning and Tensorization - Work on different IR levels

Hey.

No sorry that was not what I meant.

I gave up on the idea of using the tensorize scheduling operation because ir_pass.LoopPartition works at a different level (i.e. AST level) and I didnt feel like writing an intrin_func (needed for tensorize) template that would work for all sizes of my tiles (i.e. for the tail regions were the tiles are smaller than normal).
So I added an ir_pass in the last stage which (to my understanding) does something equivalent to what you would expect tensorize to do and at least for my purpose it works.

The way I did it is very similar (if not identical to this VTA construct), which I think is very similar to

  1. Instead of a new api, use pragmas
  2. I let the ir_pass.LoopPartition peel the loops for me
  3. Same as above
  4. Same as above

Thanks for the reply.
I feel that would be too much change for my case.
I am working on my walkaround, and I think

writing an intrin_func (needed for tensorize ) template that would work for all sizes of my tiles

it is possible do achieve that, depends on how to write the tensorize kernel.

Hi, @baowenlei1,have you resolved the “likely(xxx)” tensorization problem? or the “pragmas” is the only way to do tiling correctly? Thanks!

Have not check this site for a while. You can check the forked branch here https://github.com/microsoft/onnxruntime-tvm

Thanks for your reply!

Hi! I’ve also met the problem with “likely(…)” statements in tensorization. After enabling loop partitioning with “partition_const_loop” in config, some of them disappeared, but not all. Are there any way to completely remove “likely(…)” statements before tensorization, without writing your own IR pass?