Google lasted work: MLIR Primer

tqchen · April 8, 2019, 11:38pm

Given some of the discussions are about designs of tvm’s IR, how about we start a new thread on that?

yzhliu · April 9, 2019, 6:41am

polyhedral optimization (or at least the ability to easily apply polyhedral-like analysis) might be attractive for ASICs though, it could help to build a smarter tensorizer.

junrushao · April 9, 2019, 7:23am

It’s true. Handcrafting doesn’t scale when # of ASICs increases.

aca88 · April 9, 2019, 9:32am

Hmm I dont think TVM really has a bigger problem of hand-crafting (read my comment to the next quote), also I think every ASIC developer would have to commit to “at least” defining TVM scheduling rules. Getting that for free would obviously be nice but I dont think its realistic. That scaling in # ASICs would be completely transparent to development of the TVM infrastructure.

There is some flexibility in TVM’s scheduling rules.
I mean given a certain layer-type with (or without) possible fusions, you can have more than one scheduling rule.
You would have a higher-level scheduling rule decision making module (which is purely SW) to actually pick which of the scheduling rules to use. Yes the scheduling rules are then hand-crafted, but most likely somewhat templated so that at least to some degree you can generate diverse “flavours” (imagine the block sizes and ordering of loops) of the routine.

I am no expert in polyhedral scheduling, but that sounds like very complex problem to solve (at least fully automated).

Polyhedral would technically not require these templates, but would require the scheduling algorithm to be conforming to the capabilities to the ASIC datapaths, address generation patterns, accelerator system resources (possible scratchpad usage), etc. This for any kind of operator fusion. Here I would guess that some templated schedules or constraints would again be handcrafted.
The set of loop optimizations that TVM natively supports is a subset of all possible with polyhedral, so it would be interesting to know which are not available (not even through a mix of TVM scheduling primitives). The only one I can think about is loop skewing (to generate a SW pipeline), but even then I have a mindsketch of how it could still be realizable without any extension of the TVM primitives.

If someone is a poly expert and totally against what I say __ please __ contribute to thread or contact me!!!

@tqchen
There is one thing which I think TVM could do better and would probably fit into the MLIR vision, and that is allowing the NNVM/Relay fusion rules of nodes to be an input from ASIC backend developers.
Obviously one path is to turn-off all fusion and then implement “glue fusion” routines which are more target dependent (each ASIC developer would have to do this), but I am not sure if it would break some of the reusability of TVM code (i.e. example TVM routines to visit nodes in a graph or something like that). I guess another path would be to overwrite some layer type definitions (ex: if I want to fuse conv and pool, then define pool as element-wise operation, again every ASIC developer would have to do this) but then again I have no idea what extra problems that brings down the road.

tqchen · April 10, 2019, 1:32pm

Good tensorizor is an open problem that we all need to solve. Poly do not have advantage or disadvantage in this problem. This is a technical direction we should push to solve in TVM.

The common part between Poly and TVM is the usage of integer and integer set analysis. I believe that is the direction where MLIR and TVM might collectively improve and learn from each other.

So the key idea here is to apply integer set analysis which we could call polyhedral or hypercube analysis

tqchen · April 9, 2019, 8:02pm

Good discussions here, the design principle of TVM stack is to “be intelligent and pragmatic”. This means we want as much automation as possible, but also provide ways to make use of human domain information like schedule template, tensorized micro-kernels when necessary. We will likely continue to use this principle.

vinodgro · April 10, 2019, 12:48am

Actually the current MLIR document says that polyhedral IR is an experimental dialect of MLIR. I find that a bit odd that they would call it “experimental”.

BTW I presented polyhedral compilation of ML graphs at C4ML … and I think that polyhedral and functional approaches like Relay IR are way to go… though I think Relay goes too far on the functional side… (e.g recursion and lists)… but that is not bad just more work needs to be done there…

grwlf · April 11, 2019, 8:52am

Let me stress the attention on the fact, that MLIR doesn’t only offer different IR, it also offers different approach to scheduling via its Polyhedral dialect. For example, I see affine transformations as types in the standard.

grwlf · April 11, 2019, 9:01am

No, I think that automation is not a necessary property of polyhedral approach. See for example Loopy project (https://github.com/nimit-singhania/loopy), where scheduling rules are explicit and authors need only one step to include their grammar in the source language, like halide does now.

grwlf · April 11, 2019, 9:22am

In my opinion, it is TVM which may (and should!) benefit of polyhedral approaches. I see Relay as a different story, it may or may not use TVM as a backend.

aca88 · April 11, 2019, 11:00am

This is what I meant in my post

Since the loop transformations which TVM does are a subset of all possible with the polyhedral modelling, I guess we would be ok.
Obviously, TVM could offload part of scheduling to MLIR and invoque from there the polyhedral dialect.
That I think is part of the goal of MLIR that the “right” dialect is used for the right part of the whole compilation task

junrushao · April 11, 2019, 7:46pm

TVM could offload part of scheduling to MLIR and invoque from there the polyhedral dialect.

Yep it will be interesting to see how we could offload parts of Relay IR to different third-party IRs, including MLIR, TensorRT, etc.

vinodgro · April 29, 2019, 12:34am

There is another project called loo.py https://github.com/inducer/loopy which does loop transformations for CPUs and GPUs

junrushao · April 29, 2019, 12:23am

It’s in tvm’s acknowledgement list tho

sdll · July 1, 2020, 9:16am

Any updates on this? Since this is the primary thread discussing how MLIR and TVM relate to each other, would love to see a link posted here.

FrozenGene · July 3, 2020, 2:51am

One doc has done it (compare different DL compilers) https://arxiv.org/abs/2002.03794

xhliu · June 30, 2022, 8:45am

+1 for this. It is glad to see which one is better.

kuladeepmarupalli · March 13, 2023, 3:53am

@tqchen Now that OpenXLA has been opensourced, what are your thoughts on moving forward with regards to interoperability.

Thanks & Regards,

Kuladeep.

tqchen · March 13, 2023, 1:11pm

indeed stableHLO would be a great bridge to interpolate and bring to tvm unity

kuladeepmarupalli · March 14, 2023, 4:36am

@tqchen thanks for the reply. Any plans already in motion for bringing in stableHLO to unity?

Best regards,

Kuladeep.