How to fuse two compute?


#1

for example, i have two loop

for(1,n) {
C[i] = A[i] + B[i]
}

for(1,n) {
F[i] = D[i] + E[i]
}

how to get

for(1,n) {
C[i] = A[i] + B[i]
F[i] = D[i] + E[i]
}


#2

Try this

C, F = tvm.compute(shape, lambda i: (A[i]+B[i], D[i] + E[i]), name='C')

#3

You might also want to look at this documentation.
It shows a similar example as to what you have, but using the compute_at primitive.

I am not 100% sure when its better to do it this way or the other way maybe @lixiaoquan can clarify if I say something wrong.

  • I think it has to do if you are defining a new operation and know some statements should be always fused, then you do it like lixiaoquan said.

    • Be advised, I am not sure you can split these statements later in TVM.
  • I think if you are reusing TOPi operations and want to fuse them, your easiest option would be to use compute_at().


#4

Since C and F are completely independent, I think compute_at won’t work. compute_at can be used when there is a producer-consumer relationship between two compute.


#5

thank you very much, I didn’t know the compute can be a tuple. The doc or tutorial didn’t mention it as well.
tvm.compute


#6

for some hardware it would be useful.
My first idea is compute_at as well, but it doesn’t work for my real code.


#7

Probably because of what masahi said. So basically there is no producer/consumer relationship between both.
Thanks for the question, it made me learn something which I thought I had already understood :wink:


#8

have you solved the problem?I have the same problem as you


#9

Does TVM have any pass for that? I think this kind of optimization (merging independent loops) might be helpful for VLIW compilers.


#10

Hello @eqy @yzhliu,

I saw you two have been quite active in the past week and thought I would try a shout out and see if you could help us with this matter.

When we call C_F=tvm.compute((n,), lambda i: (A[i]+B[i], D[i] + E[i]), name=('C_F')) the output is a list of schedules (so C_F[0] and C_F[1]).
The thing is that the schedules given by

s_cf0 = tvm.create_schedule(C_F[0].op)
s_cf1 = tvm.create_schedule(C_F[1].op)

are identical (except for their address in memory).

Weirdly enough when we print(tvm.lower(s_cf0,[A,B,C,D,E,F],simple_mode=True))

// attr [C_F.v0] storage_scope = "global"
allocate C_F.v0[float32 * 1024]
produce C_F {
  for (i, 0, 1024) {
    C_F.v0[i] = (A[i] + B[i])
    C_F.v0[i] = (D[i] + E[i])
  }
}
//The same output print(tvm.lower(s_cf1,[A,B,C,D,E,F],simple_mode=True))

So even inside s_cf0’s schedule there seems to be a notion that 2 variables are been outputted but the naming does not match. I would have expected:

produce C_F {
  for (i, 0, 1024) {
    C_F.v0[i] = (A[i] + B[i])
    C_F.v1[i] = (D[i] + E[i])
  }
}