[DISCUSS] and Documentation of AlterLayout Pass

This is a question raised in https://github.com/dmlc/tvm/pull/2806 but is not specific to that question. AlterLayout was a design decision made during NNVMv1 and we migrated it into Relay.

By reading the current code documentation, it is a bit unclear on how exactly the function is going to behave. Specifically, most outermost parts depend on FTVMAlterOpLayout which is a rewrite rule, but somehow we also make use of FInferCorrectLayout as part of the implementation.

There is also confusion on why mutation on the params field is necessary(see https://github.com/dmlc/tvm/pull/2806/files#diff-79faf8272ddd59d3f0425cb363dc7ccaR28) which is quite dangerous.

I hope to use this discuss thread to have everyone to provide some clarifications. The goal is perhaps after one week, we can be on the same page, and produce a documentation for this pass. Since layout optimization is quite important, it worths the effort to get it clear and right.

cc @ajtulloch @eqy @yzhliu @merrymercy

Yeah, this tripped me up a bit yesterday working on some Relay AlterLayout stuff (specifically adding input/kernel/output layout options to Dense op to allow the weight matrix to be laid out in different ways depending on AutoTVM). cf https://github.com/ajtulloch/tvm/blob/ebef0efa1b125b94069f9d0076f59887ffc28c0e/topi/python/topi/x86/dense.py#L131-L157, https://github.com/ajtulloch/tvm/blob/ebef0efa1b125b94069f9d0076f59887ffc28c0e/src/relay/op/nn/nn.cc#L83-L170. In practice I just ended up copy-pasting/looking at the convolution implementation to figure out what the right approach is.

At a high level, my understanding is the following, let me know if this is wrong:

  1. Individual operators can optionally have layout attributes, which specify the layout of a the various input and output parameters. If they don’t have these parameters, tensors are assumed to be the “natural” NCHW or NC layout.
  2. If an operator wants to accept operators with different layouts (so it can pre-pack, use fast paths for different layouts, etc) it can do the following:

a) Add the layout fields for all inputs/outputs it wants to configure to the Attrs class.
b) Extend the operators TypeRel implementation so it correctly handles the type inference when the layout attrs are non-standard (usually BijectiveLayoutNode::Forward/Backward).
c) Implement the FInferCorrectLayout attr on the op, which allows an op to inspect the ‘pre-existing’ layout of input/output nodes and override/modify the operators previously-specified attrs, in cases where it might want to avoid adding new layout transformation ops, etc.
d) Add an entry in op_attrs.py for the op attr types
e) Register an implementation of register_alter_op_layout(op_ty) in _nn.py which is invoked in the AlterLayout pass to allow the individual operator to inspect AutoTVM tuning results, and ‘re-instantiate’ itself as a new op with modified op layout attrs, depending on AutoTVM tuning, etc.
f) After this function has been invoked by the AlterLayout pass, the graph inserts the needed layout_transform functions between the nodes where the input/output or kernel layouts has been changed. A common pattern is inserting some layout transform between a weight node and it’s usage in a dense/conv op (i.e. Conv(X, layout_transform(W))), and the PrecomputeGraph pass will typically precompute layout_transform(W) to avoid paying this cost on each iteration.

Does that make sense as a summary?

Thanks @ajtulloch for the summary. I’m actually thinking of a refactoring of the pass, currently AlterOpLayout is trying to do too many things at the same time, including infer layout, correct layout and moreover, it tries to propagate shapes on the fly (even though one operator might be replaced during the pass). Thus results the limitation https://github.com/dmlc/tvm/blob/master/src/relay/pass/alter_op_layout.cc#L319 and the dangerous mutation that @tqchen mentioned.

I plan to open an RFC this week.

@yzhliu You are right. At that time, we thought AlterOpLayout does not have dependency problem and can be done in a single forward pass, so we tried to do a lot of things in a single pass, which includes operator substitution, layout inference, and layout-transformation insertion. I agree that some semantics of this pass are confusing.

There are several use cases for this pass

  1. Replace an operator with a new data flow graph
    By registering the @register_alter_op_layout, users can replace an operator with a data flow graph. But there is a constraint that the new data flow should have the same number of inputs as the original operator.

    Weight pre-packing for dense and winograd conv2d lies in this catagory.

  2. Alternate the layout of a conv2d
    In the registered @register_alter_op_layout, if we return a conv2d operator with the same attributes, except for the different layout. Then this pass will try to insert the necessary layout transforms. The tricky part is how to insert minimal transformations. The current implementation uses a greedy method which propagates the layout of conv2d to pooling, broadcast and elemwise operators.
    The layout correction rule is done by registering FInferCorrectLayout. You can see two types of implementation. For conv2d, dense, they are “active” operators, the layout correction starts from their attributes layout. For pooling, broadcast and elemwise operators, they are “passive” operators, they need to modify themselves to fit the input layout (i.e. propagate layout)

Why mutation on const is required?
Because I want to provide a minimal interface for users to register FInferCorrectLayout, whose input and output are only layouts. However, during layout correction (or layout propagation), some “passive” operators like pooling have to modify their attributes because their attribues has the layout filed. So I ended up by modifying the const attributes in the FInferCorrectLayout function.
If we want to eliminate this mutation, ideally we should use FTVMAlterOpLayout for these operators. But registering entire rewrite rules FTVMAlterOpLayout for all operators is very redundant. Because during rewrite, one operator can even be replaced by a data flow graph, the rewrite rule is not straightforward and not only depends on the single op. There are too much common parts of FTVMAlterOpLayout.

In terms of refactoring, maybe we can decouple these two usages and implement them as several separate passes. I am happy to follow up on your RFC.