Potential fusion bug

janimesh · July 9, 2019, 6:38pm

With the following test

        def test():
            data = relay.var("data", shape=(2, 1, 2, 4), dtype="int8")
            w = relay.var("w", shape=(3, 1, 2, 2), dtype="int8")
            conv1 = relay.nn.conv2d(data, w, out_dtype="int32", kernel_size=(2, 2))
            gt = conv1 >= relay.full(tvm.relay.const(0, "int32"), shape=(2, 3, 1, 3), dtype='int32')
            one = relay.full(tvm.relay.const(1, "int32"), shape=(2, 3, 1, 3),
                    dtype='int32')
            two = relay.full(tvm.relay.const(2, "int32"), shape=(2, 3, 1, 3),
                    dtype='int32')
            where = relay.where(gt, one, two)
            add = relay.add(conv1, where)
            func = add
            func = relay.Function(relay.analysis.free_vars(func),
                    func)
            func = run_infer_type(func)


            with relay.build_config(opt_level=1):
                graph, lib, params = relay.build(func, "llvm", params=None)

Resulting error is

TVMError: Check failed: found_attach || stage_attach.size() == 0: Invalid Schedule, cannot find the producer compute(conv, 0x1e0dad0) along the loop nest specified by compute_at of consumer compute(T_greater_equal, 0x1e14130)
During handling of the above exception, another exception occurred:

TVMError: Check failed: found_attach || stage_attach.size() == 0: Invalid Schedule, cannot find the producer compute(conv, 0x1e0dad0) along the loop nest specified by compute_at of consumer compute(T_greater_equal, 0x1e14130)

The error goes away if I make where operator Opaque

@zhiics @vinx13 @tqchen

janimesh · July 9, 2019, 6:39pm

The graph looks like this

v0.0.1
fn (%p0: Tensor[(2, 1, 2, 4), int8], %p1: Tensor[(3, 1, 2, 2), int8], __dict__=meta[StrMap][0]) -> Tensor[(2, 3, 1, 3), int32] {
  %0 = nn.conv2d(%p0, %p1, kernel_size=[2, 2], out_dtype="int32") /* ty=Tensor[(2, 3, 1, 3), int32] */
  %1 = full(0 /* ty=int32 */, shape=[2, 3, 1, 3], dtype="int32") /* ty=Tensor[(2, 3, 1, 3), int32] */
  %2 = greater_equal(%0, %1) /* ty=Tensor[(2, 3, 1, 3), bool] */
  %3 = full(1 /* ty=int32 */, shape=[2, 3, 1, 3], dtype="int32") /* ty=Tensor[(2, 3, 1, 3), int32] */
  %4 = full(2 /* ty=int32 */, shape=[2, 3, 1, 3], dtype="int32") /* ty=Tensor[(2, 3, 1, 3), int32] */
  %5 = where(%2, %3, %4) /* ty=Tensor[(2, 3, 1, 3), int32] */
  add(%0, %5) /* ty=Tensor[(2, 3, 1, 3), int32] */
}

vinx13 · July 11, 2019, 2:13pm

I made this case simpler:

import topi
import tvm

def test():
    data = tvm.placeholder((2,1,2,4), 'int8', 'data')
    w = tvm.placeholder((3,1,2,2), 'int8', 'w')
    conv1 = topi.nn.conv2d(data, w, 1, 0, 1, out_dtype='int32')
    zeros = topi.full((2,3,1,3), 'int32', tvm.const(0, dtype='int32'))
    gt = topi.greater_equal(conv1, zeros)
    one = topi.full((2,3,1,3), 'int32', tvm.const(1, dtype='int32'))
    two = topi.full((2,3,1,3), 'int32', tvm.const(2, dtype='int32'))
    where = topi.where(gt, one, two)
    add = topi.add(conv1, where)
    outs = [add]
    s = topi.generic.schedule_conv2d_nchw(outs)
    print(tvm.contrib.util.get_lower_ir(s))


with tvm.target.create('llvm'):
    test()

This is equivalent to what Relay compiler does internally.

Changing outs to [add] to [conv1, add] can fix this issue. But I’m not sure what caused this, since we only need add to be the output.

janimesh · July 18, 2019, 12:50am

Hi @vinx13, did you get a chance to understand further what is happening here?

vinx13 · July 18, 2019, 2:11pm

I don’t know what caused this error. So in this example, we can avoid the error by passing [conv.op, add.op] to tvm.create_schedule, instead of [add.op]. But tvm.create_schedule only requires you pass output ops as argument.

Maybe @tqchen @masahi have idea?

tqchen · July 24, 2019, 6:20pm

likely has something to do with the traversal in the conv2d template, rather than the fusor. Would be great if we can go and check if the where is properly inlined, would be great if anyone can dig deeper. Looking into the conv2d traverse code and understand what is happening likely will resolve the problem. One possibility is that where is not properly tagged

vinx13 · July 25, 2019, 10:45am

Fix: https://github.com/dmlc/tvm/pull/3623