Batch Norm Folding

This post is just to get an idea if we are facing issues because of batch norm (referred to as bn). Currently, bn goes through Simplify Inference that lowers it to a sequence of Relay operators. I have encountered few problems with this with layout handling. Others have also faced some issues because batch norm has multiple outputs. Typically, other compilers fold bn in to the previous conv + bias_add if possible (not possible if the first layer in the graph is bn). Folding will basically remove bn and make the graph simpler. So, just trying to understand if that is something useful, or current state is good enough.