If my hardware can only do the composition op like conv+relu, I can only write this in two compute for tvm.sum must be the top level of a compute.
But tensorize can not deal with composition op yet.
So what can we do in this situation.
Is your question similar to the following question？
In tensorize tutorials, if the Hardware support gemv+relu，how to use the feature of hardware instead of doing gemv and relu separately ?
Yes, and the tutorial only show the case that tensorize a non-composition op.
I am also looking at a similar use case. Is there any solution or update?
+1 for my team. Current work around is using pragmas for hints about fusion at codegen stage