Given some of the discussions are about designs of tvm’s IR, how about we start a new thread on that?
polyhedral optimization (or at least the ability to easily apply polyhedral-like analysis) might be attractive for ASICs though, it could help to build a smarter tensorizer.
It’s true. Handcrafting doesn’t scale when # of ASICs increases.
Hmm I dont think TVM really has a bigger problem of hand-crafting (read my comment to the next quote), also I think every ASIC developer would have to commit to “at least” defining TVM scheduling rules. Getting that for free would obviously be nice but I dont think its realistic. That scaling in # ASICs would be completely transparent to development of the TVM infrastructure.
There is some flexibility in TVM’s scheduling rules.
I mean given a certain layer-type with (or without) possible fusions, you can have more than one scheduling rule.
You would have a higher-level scheduling rule decision making module (which is purely SW) to actually pick which of the scheduling rules to use. Yes the scheduling rules are then hand-crafted, but most likely somewhat templated so that at least to some degree you can generate diverse “flavours” (imagine the block sizes and ordering of loops) of the routine.
I am no expert in polyhedral scheduling, but that sounds like very complex problem to solve (at least fully automated).
Polyhedral would technically not require these templates, but would require the scheduling algorithm to be conforming to the capabilities to the ASIC datapaths, address generation patterns, accelerator system resources (possible scratchpad usage), etc. This for any kind of operator fusion. Here I would guess that some templated schedules or constraints would again be handcrafted.
The set of loop optimizations that TVM natively supports is a subset of all possible with polyhedral, so it would be interesting to know which are not available (not even through a mix of TVM scheduling primitives). The only one I can think about is loop skewing (to generate a SW pipeline), but even then I have a mindsketch of how it could still be realizable without any extension of the TVM primitives.
If someone is a poly expert and totally against what I say __ please __ contribute to thread or contact me!!!
There is one thing which I think TVM could do better and would probably fit into the MLIR vision, and that is allowing the NNVM/Relay fusion rules of nodes to be an input from ASIC backend developers.
Obviously one path is to turn-off all fusion and then implement “glue fusion” routines which are more target dependent (each ASIC developer would have to do this), but I am not sure if it would break some of the reusability of TVM code (i.e. example TVM routines to visit nodes in a graph or something like that). I guess another path would be to overwrite some layer type definitions (ex: if I want to fuse conv and pool, then define pool as element-wise operation, again every ASIC developer would have to do this) but then again I have no idea what extra problems that brings down the road.
Good tensorizor is an open problem that we all need to solve. Poly do not have advantage or disadvantage in this problem. This is a technical direction we should push to solve in TVM.
The common part between Poly and TVM is the usage of integer and integer set analysis. I believe that is the direction where MLIR and TVM might collectively improve and learn from each other.
So the key idea here is to apply integer set analysis which we could call polyhedral or hypercube analysis
Good discussions here, the design principle of TVM stack is to “be intelligent and pragmatic”. This means we want as much automation as possible, but also provide ways to make use of human domain information like schedule template, tensorized micro-kernels when necessary. We will likely continue to use this principle.
Actually the current MLIR document says that polyhedral IR is an experimental dialect of MLIR. I find that a bit odd that they would call it “experimental”.
BTW I presented polyhedral compilation of ML graphs at C4ML … and I think that polyhedral and functional approaches like Relay IR are way to go… though I think Relay goes too far on the functional side… (e.g recursion and lists)… but that is not bad just more work needs to be done there…
Let me stress the attention on the fact, that MLIR doesn’t only offer different IR, it also offers different approach to scheduling via its Polyhedral dialect. For example, I see affine transformations as types in the standard.
No, I think that automation is not a necessary property of polyhedral approach. See for example Loopy project (https://github.com/nimit-singhania/loopy), where scheduling rules are explicit and authors need only one step to include their grammar in the source language, like halide does now.
In my opinion, it is TVM which may (and should!) benefit of polyhedral approaches. I see Relay as a different story, it may or may not use TVM as a backend.
This is what I meant in my post
Since the loop transformations which TVM does are a subset of all possible with the polyhedral modelling, I guess we would be ok.
Obviously, TVM could offload part of scheduling to MLIR and invoque from there the polyhedral dialect.
That I think is part of the goal of MLIR that the “right” dialect is used for the right part of the whole compilation task
TVM could offload part of scheduling to MLIR and invoque from there the polyhedral dialect.
Yep it will be interesting to see how we could offload parts of Relay IR to different third-party IRs, including MLIR, TensorRT, etc.
There is another project called loo.py https://github.com/inducer/loopy which does loop transformations for CPUs and GPUs
It’s in tvm’s acknowledgement list tho