Hi, I noticed that the FoldScaleAxis pass is not enabled for CUDA Winograd with weight transform precomputed and x86 NCHWc convolution. So the “broadcast_mul” op of batch norm doesn’t go away after compiling. Is it intended? @merrymercy @yzhliu
To enable this, we need to add registrations for
_contrib_conv2d_NCHWc
_contrib_conv2d_winograd_without_weight_transform
following these lines here
Also, I think AlterLayout pass needs to happen after FoldScaleAxis pass.