I’m experimenting with some modification of how TVM generates code. What I’m trying to do is to initially generate realize
(in ScheduleOps) in such a way that the extent of its bounds are non-const expressions (i.e., extent depends on loop variables above realize). Then, I’d like to apply LoopPartition
afterwards to partition a loop (say, loop j) outside this realize. The result of the LoopPartition
will be two (or more) loops each with its own copy of realize. Now with the new range of j, we can simplify the non-const expression in each realize extent to a constant value.
As an example, consider this python script:
import tvm
N = 65 #tvm.var("N")
A = tvm.placeholder((N,N), "float32", "A")
C = tvm.compute((N, N), lambda i,j: tvm.select(tvm.all(i>0, i < N-1),A[i,j] + A[i+1,j], tvm.const(0. , "float32")), name='C')
s = tvm.create_schedule(C.op)
yo, xo, yi, xi = s[C].tile(C.op.axis[0], C.op.axis[1], 16, 16)
AA = s.cache_read(A, "local", [C])
s[AA].compute_at(s[C], xo)
print(tvm.lower(s, [A,C], simple_mode=True))
The output after phase 0 will be
// attr [compute(C, 0x1818f00)] realize_scope = ""
realize C([0, 65], [0, 65]) {
produce C {
for (i.outer, 0, 5) {
for (j.outer, 0, 5) {
// attr [compute(A.local, 0x18f4850)] realize_scope = "local"
realize A.local([(i.outer*16), 17], [(j.outer*16), 16]) {
produce A.local {
for (ax0, 0, 17) {
for (ax1, 0, 16) {
if (likely(((ax0 + (i.outer*16)) < 65))) {
if (likely(((ax1 + (j.outer*16)) < 65))) {
A.local((ax0 + (i.outer*16)), (ax1 + (j.outer*16))) =A((ax0 + (i.outer*16)), (ax1 + (j.outer*16)))
}
}
}
}
}
for (i.inner, 0, 16) {
for (j.inner, 0, 16) {
if (likely(((i.inner + (i.outer*16)) < 65))) {
if (likely(((i.inner + (i.outer*16)) < 65))) {
if (likely(((j.inner + (j.outer*16)) < 65))) {
if (likely(((j.inner + (j.outer*16)) < 65))) {
C((i.inner + (i.outer*16)), (j.inner + (j.outer*16))) =select((((i.inner + (i.outer*16)) > 0) && ((i.inner + (i.outer*16)) < 64)), (A.local((i.inner + (i.outer*16)), (j.inner + (j.outer*16))) + A.local(((i.inner + (i.outer*16)) + 1), (j.inner + (j.outer*16)))), 0.000000f)
}
}
}
}
}
}
}
}
}
}
}
For realize
of A.local
, I would like to initially generate this:
realize A.local([(i.outer*16), min(17,65-16*i.outer)], [(j.outer*16), min(16,65 - 16*j.outer)]) {
produce A.local {
for (ax0, 0, 17) {
for (ax1, 0, 16) {
if (likely(((ax0 + (i.outer*16)) < 65))) {
if (likely(((ax1 + (j.outer*16)) < 65))) {
A.local((ax0 + (i.outer*16)), (ax1 + (j.outer*16))) =A((ax0 + (i.outer*16)), (ax1 + (j.outer*16)))
}
}
}
}
Note the new extents of the new realize
. Rest is unchanged. The extents are calculated from the conditions of the if statements.
Now my questions:
- What is a good time to build these
realize
s? Is it more appropriate to modify them after they are initially created (i.e., afterScheduleOps
) or to generate them like above from the beginning (i.e., inBuildRealize
)? - I started to implement it in
BuildRealize
as follows: I callMakeLoopNest
andMakeBoundCheck
in the beginning ofBuildRealize
to get the list of those predicates. I got stuck because theIterVar
s of theComputeOpNode.axis
were not the same ones used in those predicates. Therefore I can’t figure out which predicates to use for each axis ofrealize
. I was thinking that if there was a many-to-one map from theIterVar
s used in the predicate toIterVar
s inComputeOpNode.axis
, I could finish the code but I couldn’t find such mapping and I am not even sure whether such a mapping is theoretically possible in general. Any advice is very much appreciated.
Cheers!