Operator fusing with AutoTVM on GPU

Is operator fusing supported when using AutoTVM with GPU?

  1. From the paper it seems that operator fusion is happening before AutoTVM (operator fusion is described in section 3, while AutoTVM in section 5).

  2. There is answer from @eqy on a question regarding auto-tuning, from which I understand that for CUDA and OpenCL it is very difficult task. Hence I assume such fusing is not supported.

  3. In AutoTVM source (e.g. https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/conv2d.py#L119) decorators appear to be only for non-fused operations.

So how this actually is with this operator fusing?

AutoTVM tunes non-fused operators such as conv2d or dense. After tuning, we fuse elemwise or broadcast ops (add, relu, etc) to it.

@vinx13 Could you provide some example that it tunes only non-fused operations?

Because I’m not quite sure. e.g. in x86 scheduler you can access relu operation.

The topic was also mentioned here: How to fuse conv2d and following elemwise op?

This is the task being tuned https://github.com/dmlc/tvm/blob/master/python/tvm/autotvm/task/topi_integration.py#L171-L179
which doesn’t include relu (of course you can create your own task with relu included so that you can tune fused operator)