Hello!
I’ve a question about vta.cc, especially on compute module.
if (insn.generic.opcode == VTA_OPCODE_FINISH) {
// Set done flag if we reach a FINISH instruction
done = 1;
} else if (insn.generic.opcode == VTA_OPCODE_LOAD) {
// Initialize indices
memop_sram_T sram_idx = insn.mem.sram_base;
memop_dram_T dram_idx = insn.mem.dram_base;
if (insn.mem.memory_type == VTA_MEM_ID_UOP) {
// Perform data transfer
memcpy(&uop_mem[sram_idx],
(const uop_T*) &uops[dram_idx],
insn.mem.x_size * sizeof(uop_T));
} else if (insn.mem.memory_type == VTA_MEM_ID_ACC) {
// Perform data transfer from DRAM
load_2d<bus_T, ACC_MAT_AXI_RATIO, VTA_ACC_ELEM_BYTES>(
biases,
acc_mem,
sram_idx,
dram_idx,
insn.mem.y_size,
insn.mem.x_size,
insn.mem.x_stride);
}
} else if (insn.generic.opcode == VTA_OPCODE_GEMM) {
gemm(raw_copy, uop_mem, acc_mem, inp_mem, wgt_mem, out_mem);
} else if (insn.generic.opcode == VTA_OPCODE_ALU) {
alu(raw_copy, uop_mem, acc_mem, inp_mem, wgt_mem, out_mem);
}
What I want to do is to check how many load and compute operation are overlapped in actual hardware. (I’m using VTA ported on ZCU104 thanks to update version pynq v0.0.1)
However, I encountered two problem.
First, it was difficult for me to understand how the C code translates into the module as shown in the picture.
Second, the way to measure the overlapped cycles between load module and compute module.
Thank you for your help in advance.