How to schedule fused ops?

For example, I have a conv followed by a relu, and they will be fused in graph, so I can only run to the conv’s schedule func, and in the schedule func I can found the conv op by name. But what if I want to do schedule to relu? How do I know it’s not other ops followed? Did I have to copy the relu schedule here?

You can still use op.name to find relu https://github.com/dmlc/tvm/blob/6a4d71ff40915611bd42b62994992b879e6be610/topi/python/topi/cuda/conv2d.py#L151
outs[0].op is the last op after fusion

1 Like

Thank you for reply!
I know the method, however, does this mean I need to write all the schedule branches for the possibly fused ops? And I need to copy their schedule here. Is there other solution?

What do you mean by scheduling relu? Relu is elemwise op, which will be fused to the loop body of conv2d.

In fact, I need to do tensorize in my schedule, so the intrin func will be different for relu, prelu and so on. And relu is a simple example, what if a complex op which need complex schedule was fused?

We currently inline all fused elemwise ops. If you need to tensorize schedule of elemwise ops, I’m afraid that you have to handle each of them manually (perform computation of elemwise ops in your intrin func)

1 Like

Got it, thank you!
May I know why TVM just fuse compute and not fuse schedule? If there is a complex op fused, does this means its schedule have to be abandoned and we can just rewrite new schedule(or traverse_inline) there.

1 Like

I think it is difficult to define the expected result of fusing schedules. Take conv2d on CUDA for example, after scheduling conv2d, you can fusing any elemwise ops to it because conv2d has a stage of copying from registers to global memory, where elemwise computation can be performed. However, we can’t directly schedule conv2d+relu unless you have tensor intrincs backed by external implementation

Hi, can you show the critical schedule statement that makes the conv2d op fused into its following elementwise op, e.g. topi.nn.relu?

I found it should use compute_at but it seems to be a little hard to use that well.

B = conv2d(A)
C = topi.nn.relu(B)
..
schedule_for_conv2d(s[B], ..)
s[B].compute_at(s[C], C.op.axis[what is reasonable?])

Thank you, I found s[conv].set_scope(‘local’) must be set if we want to fuse following injective ops after a computation that has reduce axis like Convolution.

But is it possible that the schedule for Convolution is not planed to use local scope or cache local, but need to fuse its following injective ops. This use case is for some shader languages that don’t allow user to define local variable array, something like float result_local[1024]; is prohibited.

Hello, can you tell me how to correctly fuse ops? Thanks!