How to fuse two compute?

lenLRX · February 26, 2019, 2:50pm

for example, i have two loop

for(1,n) {
C[i] = A[i] + B[i]
}

for(1,n) {
F[i] = D[i] + E[i]
}

how to get

for(1,n) {
C[i] = A[i] + B[i]
F[i] = D[i] + E[i]
}

lixiaoquan · February 27, 2019, 6:36am

Try this

C, F = tvm.compute(shape, lambda i: (A[i]+B[i], D[i] + E[i]), name='C')

aca88 · February 27, 2019, 8:23am

You might also want to look at this documentation.
It shows a similar example as to what you have, but using the compute_at primitive.

I am not 100% sure when its better to do it this way or the other way maybe @lixiaoquan can clarify if I say something wrong.

I think it has to do if you are defining a new operation and know some statements should be always fused, then you do it like lixiaoquan said.
- Be advised, I am not sure you can split these statements later in TVM.
I think if you are reusing TOPi operations and want to fuse them, your easiest option would be to use compute_at().

masahi · February 27, 2019, 8:47am

Since C and F are completely independent, I think compute_at won’t work. compute_at can be used when there is a producer-consumer relationship between two compute.

lenLRX · February 27, 2019, 2:08pm

thank you very much, I didn’t know the compute can be a tuple. The doc or tutorial didn’t mention it as well.
tvm.compute

lenLRX · February 27, 2019, 2:16pm

for some hardware it would be useful.
My first idea is compute_at as well, but it doesn’t work for my real code.

aca88 · February 27, 2019, 2:30pm

Probably because of what masahi said. So basically there is no producer/consumer relationship between both.
Thanks for the question, it made me learn something which I thought I had already understood

yjk1 · March 5, 2019, 1:59pm

have you solved the problem?I have the same problem as you

XinchengHan · March 8, 2019, 6:02am

Does TVM have any pass for that？ I think this kind of optimization (merging independent loops) might be helpful for VLIW compilers.

aca88 · March 8, 2019, 8:27am

Hello @eqy @yzhliu,

I saw you two have been quite active in the past week and thought I would try a shout out and see if you could help us with this matter.

When we call C_F=tvm.compute((n,), lambda i: (A[i]+B[i], D[i] + E[i]), name=('C_F')) the output is a list of schedules (so C_F[0] and C_F[1]).
The thing is that the schedules given by

s_cf0 = tvm.create_schedule(C_F[0].op)
s_cf1 = tvm.create_schedule(C_F[1].op)

are identical (except for their address in memory).

Weirdly enough when we print(tvm.lower(s_cf0,[A,B,C,D,E,F],simple_mode=True))

// attr [C_F.v0] storage_scope = "global"
allocate C_F.v0[float32 * 1024]
produce C_F {
  for (i, 0, 1024) {
    C_F.v0[i] = (A[i] + B[i])
    C_F.v0[i] = (D[i] + E[i])
  }
}
//The same output print(tvm.lower(s_cf1,[A,B,C,D,E,F],simple_mode=True))

So even inside s_cf0’s schedule there seems to be a notion that 2 variables are been outputted but the naming does not match. I would have expected:

produce C_F {
  for (i, 0, 1024) {
    C_F.v0[i] = (A[i] + B[i])
    C_F.v1[i] = (D[i] + E[i])
  }
}