Now , we have a new NPU and we want to use tensorize to extend TVM to support our NPU. But unfortunately we meet a problem. The following is the schedule after tensorize:
for (co.init, 0, 4) {
tvm_call_packed("tvm.contrib.bmcv.memset", 0, tvm_address_of(C_buf[(co.init*16)]))
}
for (ko, 0, 4) {
for (co, 0, 4) {
tvm_call_packed("tvm.contrib.bmcv.gemm.forward", (uint1)0, (uint1)1, 1, 16, 16, 1.000000f, tvm_address_of(A[(ko*16)]), 0, tvm_address_of(B[((co*1024) + (ko*256))]), 0, 0.000000f, tvm_address_of(C_buf[(co*16)]), 0)
}
}
We can see that tvm_address_of(A[(ko16)])* will set the offset of the address of A in each loop. but for our NPU , A’s address is not the NPU’s address. A is a struct that contains the NPU’s address. So A[(ko16)]* doesn’t change the NPU’s address, just change the struct address. Is there any method that we can control the offset in A’s member address, not A’s address? So Any one can help me?