I know that currently the TVM automatic fusion of graph operations is somewhat limited.
Conv operations are output fusible and therefore if an activation layer follows, this can be fused.
But if a pooling layer follows the activation, this will not get automatically fused.
I can imagine that some accelerators will have some HW support for common pooling routines (like maxpool with a 2x2). My question is how would something like that be integrated into TVM.
The lowering part of TVM allows for injection of user defined passes, but all of this happens after the automatic fusion of operators, meaning that the Conv-relu and pooling will be in two different subgraphs. So it seems that doing this at the lowering phase is no ideal, therefore it would have to be before lowering.
The thing is that I dont really see an obvious way to inject user defined passes in the NNVM part of the compiling process.
In the VTA example they add some passes by calling them before the NNVM build, which I guess is one way of doing it. Nonetheless, doing it before NNVM build means it could not take advantage of the first optimizations done by NNVM.