VTA instruction set architecture


#1

Whether vta.lower() generates VTA ISA?. If then is this the proper format of representing the ISA in VTA?
“”
// attr [C_buf] storage_scope = “local.acc_buffer”

// attr [A_buf] storage_scope = “local.inp_buffer”

// attr [B_buf] storage_scope = “local.wgt_buffer”

produce C_buf {

// attr [iter_var(vta, , vta)] coproc_scope = 2

// attr [iter_var(vta, , vta)] coproc_uop_scope = “VTAPushGEMMOp”

VTAUopLoopBegin(16, 1, 0, 0)

VTAUopPush(0, 1, 0, 0, 0, 0, 0, 0)

VTAUopLoopEnd()

vta.coproc_dep_push(2, 1)

for (ko, 0, 16) {

// attr [iter_var(vta, , vta)] coproc_scope = 1

vta.coproc_dep_pop(2, 1)

produce A_buf {

  VTALoadBuffer2D(tvm_thread_context(VTATLSCommandHandle()), A, ko, 1, 1, 1, 0, 0, 0, 0, 0, 2)

}

produce B_buf {

  VTALoadBuffer2D(tvm_thread_context(VTATLSCommandHandle()), B, ko, 1, 16, 16, 0, 0, 0, 0, 0, 1)

}

vta.coproc_dep_push(1, 2)

// attr [iter_var(vta, , vta)] coproc_scope = 2

vta.coproc_dep_pop(1, 2)

// attr [iter_var(vta, , vta)] coproc_uop_scope = "VTAPushGEMMOp"

VTAUopLoopBegin(16, 1, 0, 1)

VTAUopPush(0, 0, 0, 0, 0, 0, 0, 0)

VTAUopLoopEnd()

vta.coproc_dep_push(2, 1)

}

vta.coproc_dep_push(2, 3)

vta.coproc_dep_pop(2, 1)

}

// attr [iter_var(vta, , vta)] coproc_scope = 3

vta.coproc_dep_pop(2, 3)

produce C {

VTAStoreBuffer2D(tvm_thread_context(VTATLSCommandHandle()), 0, 4, C, 0, 16, 1, 16)

}

vta.coproc_sync()

“”


#2

@pinnacle (you may want to fix the formatting of your post which seems to be broken): vta.lower() lowers the TVM schedule into calls to the VTA runtime that in turn will generate the VTA ISA.

You’ll see that these calls (e.g. VTAUopPush) are defined in the runtime, which assembles the VTA binary on the fly: https://github.com/dmlc/tvm/blob/master/vta/src/runtime.cc

Consequently when we lower the TVM schedule, we are still generating an ARM binary that will JIT the VTA binary when deployed on the ARM host processor of the FPGA we target.


#3

@thierry…thanks for the reply. Is there any code snippet in the documentation that can be used to generate ISA further?.


#4

If you’d like to debug and print the ISA (both task-level ISA, and micro-op ISA, there are debug flags for that).

See https://github.com/dmlc/tvm/blob/master/vta/include/vta/runtime.h#L40-L41

In your top level python, you can replace this line with with vta.build_config(debug_flag=2): if you want to print the instructions. These will be dumped into stdout of the pynq rpc server session if you’re running in hardware, of if you’re in simulation, in the terminal from which you are running the python script.


[VTA&TVM] Questions after investigating resnet.py tutorial
#5

Thanks Thierry for the reply. It surely gave me pointers. However, if you could answer the following question then it would be very helpful for me.

“”
I need to execute the runtime.cc with the mentioned changes, however, I don’t see/understand how that functipn (config_build() ) can be called manually from matrix multiplication example to get the ISA generated for this computation

“”