[NNVM] layout_transform can possibly be merged in residual block

I met a suboptimal case in graph fusion. In conv2d, we sometimes can use packed layout (e.g. NCHW4c) if the number of channels is multiple of 4. Otherwise we will use default layout NCHW.
Typically, we use NCHW in the first layer of CNN (with shape 3x224x224), and then convert to packed layout for subsequent layers (in alter_op_layer, we check if the input can be channel-packed).
In residual block, layout_transform in two branches are not merged.

Here is an example.

Network:

data (3x224x224) -> conv2d1 (16x224x224) -> conv2d2  
                       |                     | +
                       -----------------------

Current graph after alter_op_layout:

data -> conv2d1 -> layout_transform (NCHW->NCHW4c) -> conv2d2
           |                                           | +
           ----- layout_transform (NCHW->NCHW4c) ------|       

We have these groups of nodes after fusion: conv2d1, layout_transform, conv2d2, layout_transform+broadcast_add.

I expect the two layout_transform to be merged.

A quickfix is set out_layout of first conv2d layer to NCHW4c in alter_op_layout. But this seems not a good solution since we don’t know what is the next layer and if it possible to use NCHW4c layout.

cc @masahi @tqchen

Agreed. The reason layout transform is not fused into conv is that layout transform is registered as an “injective” op, and currently injectve ops and conv are not supposed to be fused.

But I don’t know why fusing injective ops with conv op is not a good idea. Most of the ops registered as injective, such as layout transform, reshape, flatten etc. are just one to one memory permutation, so I think they can be fused just like elemwise ops.

@vinx13 can you try modify OpPattern of injective op here to OpPattern.ELEMWISE and see if it gives what you want?

I changed it to ELEMWISE but the graph remains the same. btw, is there a way to visualize / print the graph after fusion?

you can use print(graph.ir()) to see the graph structure, or if you want to see pseudo code after fusion, add logging.basicConfig(level=logging.DEBUG) before nnvm.compiler.build.

I see why the two layout transforms are not fused to parent conv op.

See this PR. Currently, fusion support for multiple children branch is limited. My PR there enables fusion when the node where children meets is an elemwise or broadcast op. In your case, the two layout transform node meets at the second conv2d node, so my enhancement there doesn’t work in that case.