'Fuzzy' tensorization

From what I can tell, tensorize expects to match the compute body exactly. It looks like it does this with a structural equality check. However, there are cases where I’d like to tensorize in a slightly fuzzy way. To give a trivial example of an operation that just adds a constant value to every element in a tensor:

def test_const_match():
    def _add(value):
        a = te.placeholder((1, 224, 224, 3), name="a")
        return te.compute(a.shape, lambda i, j, k, l: a[i, j, k, l] + value, name="add2"), a

    def _add_intrin():
        val = te.var("val")
        out, data = _add(val)
        data_buf = tvm.tir.decl_buffer(
            data.shape, data.dtype,
            name="data_buf",
            offset_factor=1,
            strides=[te.var("ds1"), te.var("ds2"), te.var("ds3"), te.var("ds4")],
        )
        out_buf = tvm.tir.decl_buffer(
            out.shape, out.dtype,
            name="out_buf",
            offset_factor=1,
            strides=[te.var("ob1"), te.var("ob2"), te.var("ob3"), te.var("ob4")],
        )
        def _intrin_func(ins, outs):
            ib = tvm.tir.ir_builder.create()
            aa = ins[0]
            cc = outs[0]
            ib.emit(tvm.tir.call_extern("int32", "intrin_add", *aa.shape,
                                        cc.access_ptr("w"),
                                        aa.access_ptr("r"),
                                        *aa.strides))
            return ib.get()
        return te.decl_tensor_intrin(out.op, _intrin_func, binds={data: data_buf, out: out_buf})

    out, a = _add(4)
    s = te.create_schedule([out.op])
    s[out].tensorize(out.op.axis[0], _add_intrin())
    print(tvm.lower(s, [a]))

This fails with:

TVMError: Check failed: expr_equal(lhs, rhs): Failed to match the compute with TensorIntrin tensor_intrin's declaration  provided= (a[0, j, k, l] + 4f), intrin=  (a[0, j, k, l] + float32(val))

This makes sense given the test is for structural equality. However, it means the only way to match this operator is to know in advance what the value of the constant is. This comes up in less trivial examples with operators such as ‘pad’ which pad by a constant value which is written directly into the te.compute.

Are there any known workarounds for this, or will it require an extension to the capabilities of tensorize?

Many thanks in advance

1 Like

CC: @were @Hzfengsy @spectrometerHBH

For now, you can pass a parameter in _add_intrin(val) and try to tensorize through

s[out].tensorize(out.op.axis[0], _add_intrin(4))

Also, we are currently working on a new schedule and tensorize algorithm with new APIs. We will consider your helpful advice.

2 Likes

@Hzfengsy,

May I have your opinion on the need of 4th “Store” ?: Store only rule for tensorize intrinsic?

I would PR if make sense.