As the title says. A simple example:
import tvm
def schedule(A, B):
s = tvm.create_schedule(B.op)
x, = s[B].op.axis
xo, xi = s[B].split(x, factor=8)
xoo, xoi = s[B].split(xo, factor=2)
s[B].vectorize(xi)
return s
def test():
A = tvm.placeholder((128,), name="A")
B = tvm.compute((128,), lambda i: A[i] + 1, name="B")
device = "llvm -mcpu=core-avx2"
ctx = tvm.context(device, 0)
with tvm.target.create(device):
s = schedule(A, B)
print(tvm.lower(s, [A, B], simple_mode=True))
func = tvm.build(s, [A, B], device, name=("test"))
if __name__ == "__main__":
test()
When the split factor of xo is 2, the IR code looks like:
produce B {
for (i.outer.outer, 0, 8) {
for (i.outer.inner, 0, 2) {
B[ramp(((i.outer.outer*16) + (i.outer.inner*8)), 1, 8)] = (A[ramp(((i.outer.outer*16) + (i.outer.inner*8)), 1, 8)] + x8(1f))
}
}
}
which shows that the axis xi is vectorized. However, if we change the xo split factor to 3, we get:
produce B {
for (i.outer.outer, 0, 6) {
for (i.outer.inner, 0, 3) {
for (i.inner.s, 0, 8) {
if (likely(((((i.outer.outer*24) + (i.outer.inner*8)) + i.inner.s) < 128))) {
if (likely((((i.outer.outer*3) + i.outer.inner) < 16))) {
if (likely(((((i.outer.outer*24) + (i.outer.inner*8)) + i.inner.s) < 128))) {
B[(((i.outer.outer*24) + (i.outer.inner*8)) + i.inner.s)] = (A[(((i.outer.outer*24) + (i.outer.inner*8)) + i.inner.s)] + 1f)
}
}
}
}
}
}
}
and we see the axis xi is not vectorized anymore.
Is TVM designed to be like this? What if we really want to use something like 3 in this case and have the code correctly vectorized?
Thanks in advance!