Fusing conv-act-pooling

aca88 · February 25, 2019, 3:24pm

Hello all,

I know that currently the TVM automatic fusion of graph operations is somewhat limited.
Conv operations are output fusible and therefore if an activation layer follows, this can be fused.
But if a pooling layer follows the activation, this will not get automatically fused.

I can imagine that some accelerators will have some HW support for common pooling routines (like maxpool with a 2x2). My question is how would something like that be integrated into TVM.

The lowering part of TVM allows for injection of user defined passes, but all of this happens after the automatic fusion of operators, meaning that the Conv-relu and pooling will be in two different subgraphs. So it seems that doing this at the lowering phase is no ideal, therefore it would have to be before lowering.
The thing is that I dont really see an obvious way to inject user defined passes in the NNVM part of the compiling process.

In the VTA example they add some passes by calling them before the NNVM build, which I guess is one way of doing it. Nonetheless, doing it before NNVM build means it could not take advantage of the first optimizations done by NNVM.

Any ideas?

aca88 · February 28, 2019, 10:53am

@tqchen @thierry
So what I am asking is, are there ongoing efforts to add a HW dependent graph optimization phase?
I feel like NNVM graph optimizations are “too general” and TVM’s lowering is only applicable to an already identified kernel. So there seems to be something missing.

I saw the roadmaps to v0.6 and can’t really pinpoint something related.

Thanks