Strange expansion of tensors' bounds


#1

Consider the following code:

import tvm

def show_lowered(outputs, inputs):
    sout = tvm.create_schedule([o.op for o in outputs])
    mout = tvm.lower(sout, outputs + inputs, simple_mode=True)
    print(mout)

A = tvm.compute((10, 2), lambda i, j: i + j)
show_lowered([A], [])
B = tvm.compute((10,), lambda i: A[i, i + (0 - i)])
show_lowered([B], [])

For A it produces the following lowered code, as expected:

produce compute {
  for (i, 0, 10) {
    for (j, 0, 2) {
      compute[((i*2) + j)] = (i + j)
    }
  }
}

But for B it produces the following code, expanding the size of A:

// attr [compute] storage_scope = "global"
allocate compute[int32 * 10 * 19]
produce compute {
  for (i, 0, 10) {
    for (j, 0, 19) {
      if (likely((9 <= j))) {
        if (likely((j < 11))) {
          compute[((i*19) + j)] = ((i + j) + -9)
        }
      }
    }
  }
}
produce compute {
  for (i, 0, 10) {
    compute[i] = compute[((i*19) + 9)]
  }
}

I guess, TVM don’t simplify i + (0 - i), so it infers the bounds for the second dimension of A too conservatively. My question is: why does it change the size of A at all? I think, adding a check or an assertion inside of B would be a better solution than expanding A, but maybe I’m missing something?


#2

I was able to reproduce this error on recent master c48812d

Edit
Here we see that TVM applies some memory optimization algorithm. While I can’t comment on this particular algorithm, I see that it is applied without any scheduling instruction from DSL. I think it may be a sign of higher-order design problem. I expected the default code to be of a much simpler form like

allocate A[int32 * 10 * 2]
allocate B[int32 * 10]
produce A { ... }
produce B { ... }

#3

FYI https://github.com/dmlc/tvm/pull/2104