If my hardware can only do the composition op like conv+relu, I can only write this in two compute for tvm.sum must be the top level of a compute.
But tensorize can not deal with composition op yet.
So what can we do in this situation.
Is your question similar to the following question？
In tensorize tutorials, if the Hardware support gemv+relu，how to use the feature of hardware instead of doing gemv and relu separately ?
Yes, and the tutorial only show the case that tensorize a non-composition op.
I am also looking at a similar use case. Is there any solution or update?
+1 for my team. Current work around is using pragmas for hints about fusion at codegen stage
Hi, can you describe your schemes a bit more thoroughly? I encounted this problem as well.
@xwrock For now I think the proper way to work around this is to go the BYOC route until this part of TVM becomes more flexible for custom accelerators.
@xwrock another way is to bypass tensorize like the ppl of VTA did. It requires some manipulation of the low level AST, but I guess the end effect are the same.