[RFC] Promote indices data type to i64

hzfan · July 8, 2020, 4:43am

I’d like to introduce a new pass, which automatically promotes the datatype of IterVar. After this pass, the promoted type can be narrowed down by a type narrowring pass introduced in #5092 (as is discussed in 5643). It has two benefits:

With this pass, users don’t have to wrap an integer with IntImm(x, 'int64') in order to use i64 indices. They will be promoted to i64 automatically.
It helps avoid unexpected overflow.

Strategy

One can certainly promote everything to i64 without ruin the correctness of the program. But unrestricted data type promotion hinders the narrowing afterwards. The more fine-grained our type promotion strategy is, the more narrowing we can do afterwards.

In order to make the promotion as reversible as possible, we use the following rule to determine which expression is to be promoted:

Unbounded variables like n = te.var('n') are not promoted, because var.i32 and var.i64 represents different things: the former ranges within i32 while the latter within i64.
If an expression contains unbounded variables, sub-expressions of it are not promoted. 2.i32 * var.i32 and 2.i64 * i64(var.i32) represents different things: the former suggests that the variable fits in i32, and the expression as a whole also fits in i32, while the latter suggests the variable fits in i32, but the expression as a whole does not.

In my implementation

an expression is unbounded if it does not contain unbounded variables. Otherwise it’s bounded.
a variable is bounded if its range (its min and extent) are bounded expressions. Otherwise it’s unbounded.

Phase

P0: Relay

I find it difficult to do it in relay. Some relay ops are implemented in python (mostly ops with autotvm). These ops unpack relay shapes (from c++, reprensented as an array of IntImm) to python integers, perform some operations, then send them back to c++. In this process, We lose the datatype of IntImm when converting to python integers.

P1: TE Operation Level.

API:

Operation PromoteIterVarType(Operation op);

Take the compute operation as an example:

def compute():
  # original code...
  ret = PromoteIterVarType(ret)
  return ret

There are two possible problems

Variables can be introduced at schedule level. For example, the factor of split can be a variable. In this case, data type promotion should not have been performed. We failed to detect this variable at operation level, before schedules are given.
Variables can be introduced by inline. We cannot detect them at operation level, before the inline pass.

n = te.var('n')
a = te.placeholder((10, 2), name='a')
b = te.placeholder((n, 2), name='b')
c = te.compute((10, 2), lambda i, j: a[i, j])
d = te.compute((10 + n, 2), lambda i, j: te.select(i >= n, c[i - n, j], b[i, j]))

In this case, by our rules, c is promoted, and d is not. After inlining c into d, we have c[i - n, j] = a[i64(i - n), i64(j)] = a[i64(i - n) * i64(2) + i64(j)], which cannot be narrowed to (i - n) * 2 + j.

P2: TE Schedule Level, between `normalize` and `InferBound`.

API:

Schedule Schedule::promote_iter_var_type();

A possible problem is that the buffer stride datatype will be promoted along with it. But there is no guarantee that the stride is not involved with any variable indices in operations that take this tensor as inputs.

P3: TIR Level, between StorageFlatten and NarrowDataType

I haven’t dived deep into this. I guess it’s similar to P2, except that in P2 IterVar is promoted, while here For is promoted.

It does not have the buffer stride datatype problem because it is performed after the creation of buffers.

A possible problem is that TIR is more expressive than TE. So P3 may be a bit more complex than P2. For example, at TE level, a nested loop with i in (0, 10), j in (0, i) is not permitted (except for reduction axes), but TIR permits it.

would be great if you could share some thoughts. @junrushao @yzhliu @tqchen @ziheng @vinx13

tqchen · July 8, 2020, 10:43pm

Would be a non-traditional pass, given that promoting to i64 changes the behavior of the program(if the intent was i32).

I still think a better solution would be to move the integer defaults to i64 as much as possible, and migrate most of the relay shape constructs to use i64

hzfan · July 11, 2020, 12:35pm

@tqchen agree. I will work towards this.

Firstly, I may have the tests introduced in #5219 fixed. I suppose this is where the discussion starts.

Then, to default the shape to i64, I plan to work in the two steps:

Remove the data type i32 cast at https://github.com/apache/incubator-tvm/blob/master/src/relay/backend/compile_engine.cc#L83. So when relay defaults to i64, they will not be overwritten back to i32.
Defaults the shape parameter of relay.var to i64. To be more specific, when shape is python integers, they get converted to IntImm with i64 datatype, instead of i32.

hzfan · July 27, 2020, 4:06am

@zhiics @tqchen Update:

#5219 has been fixed by #5235.

Next step I will remove the data type i32 cast at https://github.com/apache/incubator-tvm/blob/master/src/relay/backend/compile_engine.cc#L83