Configurable fusion and heterogeneous execution

Let’s say I have multiple devices on a single piece of hardware that can accelerate various computations. Is there a way to configure which Relay nodes can be fused for each device without having to write an IR pass? Since IR passes are already in place that fuse nodes based on default rules, it seems like this could be extended to something more general and easily configurable (a config file perhaps).

This could potentially open up a search space at the high-level much like AutoTVM acts on the low-level. Devices could have many overlapping fusable nodes and finding the best combination may be difficult.

I agree that the current default rules for op fusion is not enough. So a configurable op fusion instead of fixed rules will be a very useful feature for accelerators, 3rd party code gen, and potentially training in the future.

I wonder if you’ll be interested at working on a RFC or design that can allow op fusion pass to take in a set of predefined rules. And then, we can move on to develop some ML-based searchers to find a best fusion strategy based on these rules.

also cc @jroesch @zhiics @tqchen @MarisaKirisame @vinx13

2 Likes

+1 for making it configurable. Our current fusion is actually hardware agnostic. As @haichen mentioned that some accelerators might have their own fusion rules. In addition, we also hard coded the number maximum number of allowed fused node in the pass.

1 Like

+1 for having an RFC to make op fusion more configurable. Along those lines, I think that we should reuse this framework for multiple things, like quantization, and some graph re-write operations. In short, the idea would be to have a set of rules that define how a graph should be pre-processed to be run in a heterogeneous fashion efficiently.

2 Likes

@thierry, I agree with extending a configuration framework wherever possible.