for example, i have two loop
for(1,n) {
C[i] = A[i] + B[i]
}
for(1,n) {
F[i] = D[i] + E[i]
}
how to get
for(1,n) {
C[i] = A[i] + B[i]
F[i] = D[i] + E[i]
}
for example, i have two loop
for(1,n) {
C[i] = A[i] + B[i]
}
for(1,n) {
F[i] = D[i] + E[i]
}
how to get
for(1,n) {
C[i] = A[i] + B[i]
F[i] = D[i] + E[i]
}
Try this
C, F = tvm.compute(shape, lambda i: (A[i]+B[i], D[i] + E[i]), name='C')
You might also want to look at this documentation.
It shows a similar example as to what you have, but using the compute_at primitive.
I am not 100% sure when its better to do it this way or the other way maybe @lixiaoquan can clarify if I say something wrong.
I think it has to do if you are defining a new operation and know some statements should be always fused, then you do it like lixiaoquan said.
I think if you are reusing TOPi operations and want to fuse them, your easiest option would be to use compute_at().
Since C and F are completely independent, I think compute_at won’t work. compute_at can be used when there is a producer-consumer relationship between two compute.
thank you very much, I didn’t know the compute can be a tuple. The doc or tutorial didn’t mention it as well.
tvm.compute
for some hardware it would be useful.
My first idea is compute_at as well, but it doesn’t work for my real code.
Probably because of what masahi said. So basically there is no producer/consumer relationship between both.
Thanks for the question, it made me learn something which I thought I had already understood
have you solved the problem?I have the same problem as you
Does TVM have any pass for that? I think this kind of optimization (merging independent loops) might be helpful for VLIW compilers.
I saw you two have been quite active in the past week and thought I would try a shout out and see if you could help us with this matter.
When we call C_F=tvm.compute((n,), lambda i: (A[i]+B[i], D[i] + E[i]), name=('C_F'))
the output is a list of schedules (so C_F[0] and C_F[1]).
The thing is that the schedules given by
s_cf0 = tvm.create_schedule(C_F[0].op)
s_cf1 = tvm.create_schedule(C_F[1].op)
are identical (except for their address in memory).
Weirdly enough when we print(tvm.lower(s_cf0,[A,B,C,D,E,F],simple_mode=True))
// attr [C_F.v0] storage_scope = "global"
allocate C_F.v0[float32 * 1024]
produce C_F {
for (i, 0, 1024) {
C_F.v0[i] = (A[i] + B[i])
C_F.v0[i] = (D[i] + E[i])
}
}
//The same output print(tvm.lower(s_cf1,[A,B,C,D,E,F],simple_mode=True))
So even inside s_cf0’s schedule there seems to be a notion that 2 variables are been outputted but the naming does not match. I would have expected:
produce C_F {
for (i, 0, 1024) {
C_F.v0[i] = (A[i] + B[i])
C_F.v1[i] = (D[i] + E[i])
}
}