If I don't want to unpack batch normalization

I have a question regarding batch normalization in NNVM.
By default (with opt_level=0), it seems that NNM unpacks batch normalization into several unit operators such as add_scalar, sqrt, rdiv_scalar, and elemwise_mul.
Here, If I want to keep batch normalization as a unit operator and fuse operators like @tvm_op(…, …, , func_name=”fuse_conv2d_batch_norm_relu_…", …), what steps do I need to do?

So far I notice that,

  1. Computation of batch normalization is implemented in Python (./tvm/topi/python/topi/nn/batch_norm.py) as well as C++ (./tvm/topi/include/topi/nn/batch_norm.h).
  2. The computation and schedule is not registered to NNVM, as seen in ./python/nnvm/top/nn.py) whereas it’s registered to TVM as in ./tvm/topi/src/topi.cc).
  3. Operator pattern for batch normalization is registered as BROADCAST, as seen in tag scope of batch_norm_inference() in batch_norm.py.

I tried to register batch normalization op to NNVM and execute the code, but it eventually fails in GraphFuseCompile(), specifically when fcompute (essentially, the lambda function of batch normalization code in batch_norm.py) is called in compute() in ./tvm/python/tvm/api.py. Since the fcompute is a lamba function, I’m not sure how I could efficiently debug or trace the code.

Do you have any comments or suggestions? Thanks!

Won

unfortunately it is hard to do so. You could mark batchnorm as opaque and provide a manual schedule for that.

But note that TVM automatically fuses things together again so unpacking batchnorm actually makes followup optimizations easier

Can I ask why do you want a single batchnorm? you have special implement for it?

I imagine there is a use case when a fully optimized batch norm intrinsic or instruction is provided from a particular hardware platform.

Hmm… that will be interesting. well… if it was the case, maybe we could have some arguments or system env variables tell nnvm not to unpack a specific operator, (for now only batchnorm and dropout)

Hi,

Yes; we ran into similar situation here.
We are trying to use NNVM+TVM for an accelerator like NVDLA, and batchnorm is part of “instruction set” primitive, and for optimal pipeline operation, it can be fused with conv2d.

Right now, we sort of hack the core library to workaround this limitation, but it will be great if this is supported cleanly by NNVM framework.

Regards.
–kahho