Can TVM accumulate to register without store everytime?

In the reduce operation, especially vectorized op, I want this:

func( a, b):
vec_reg = (0, 0, 0, 0)
for i in 0...N
    vec_reg += load(b[i])
store( a,vec_reg)

however I can only get:

func(a,b):
store(a,(0, 0, 0, 0)
for i in 0...N
    store(a, load(a) + load(b[i])

which means unnecessary load to store every time. Is there any way to solve this? For example, how to declare a vec_reg?

Most low level code generator (LLVM) will rewrite small constant size array to register, so we will get that for free

Thank you, is this also true of compilers such as gcc, nvcc?

Yes, it should be true for most compilers

1 Like