I met a suboptimal case in graph fusion. In conv2d, we sometimes can use packed layout (e.g. NCHW4c) if the number of channels is multiple of 4. Otherwise we will use default layout NCHW.
Typically, we use NCHW in the first layer of CNN (with shape 3x224x224), and then convert to packed layout for subsequent layers (in alter_op_layer, we check if the input can be channel-packed).
In residual block, layout_transform in two branches are not merged.
Here is an example.
Network:
data (3x224x224) -> conv2d1 (16x224x224) -> conv2d2
| | +
-----------------------
Current graph after alter_op_layout:
data -> conv2d1 -> layout_transform (NCHW->NCHW4c) -> conv2d2
| | +
----- layout_transform (NCHW->NCHW4c) ------|
We have these groups of nodes after fusion: conv2d1, layout_transform, conv2d2, layout_transform+broadcast_add.
I expect the two layout_transform to be merged.
A quickfix is set out_layout of first conv2d layer to NCHW4c in alter_op_layout. But this seems not a good solution since we don’t know what is the next layer and if it possible to use NCHW4c layout.