[SOLVED][Bug][VTA] VTA DRAM Have A Logic Issue May Cause GEMM Output Is Wrong

Hi There,

VTA Have a logic problem in simulator DRAM part code, sometime it would cause the GEMM compute output wrong. this issue happen after change
“LOG_BLOCK_IN” and “LOG_BLOCK_OUT” into 7.

another symptom of the said bug is VTA crash and report following error.
“Check failed: phy_addr != 0 (0 vs. 0) : trying to get address that is nullptr”

After go through VTA code( simulator) , I found that this issue caused by VTA kPageSize logic, in this logic simulator hardcode kPageSize into 1<<12, and physical address calculate based on this size, when doing “insn->dram_base” calculation , because GetElemBytes(dst_memory_type) larger than page size, different physcial address may get same dram_base, than caused logic issue and finally trigger GEMM out put is wrong.

to fix such issue , I worked for a patch as following which would do parameter check and alert once the size mismatch issue happen.

Please kindly let me know how you think for this issue and patch.

Regards
Hua

Issue:

VTA GEMM Output
[[ -5 36 -58 … -31 32 3]
[ -5 36 -58 … -31 32 3]
[ -5 36 -58 … -31 32 3]

[ -5 36 -58 … -31 32 3]
[ -5 36 -58 … -31 32 3]
[ -5 36 -58 … -31 32 3]]
numpy.dot output
[[-15 -98 40 … -68 122 -6]
[-15 -98 40 … -68 122 -6]
[-15 -98 40 … -68 122 -6]

[-15 -98 40 … -68 122 -6]
[-15 -98 40 … -68 122 -6]
[-15 -98 40 … -68 122 -6]]

Patch:
— a/tvm/vta/include/vta/hw_spec.h
+++ b/tvm/vta/include/vta/hw_spec.h
@@ -347,6 +347,9 @@ extern “C” {
/*! GEMM Micro-op end position of the inp_idx field */
#define VTA_UOP_ALU_1_1 (VTA_UOP_ALU_1_0 + VTA_LOG_INP_BUFF_DEPTH - 1)

+/*! DRAM PAGE SIZE /
+#define VTA_PAGE_BITS 12
+#define VTA_PAGE_BYTES (1<<VTA_PAGE_BITS)
/
! \brief VTA generic instruction */
typedef struct {
uint64_t word_0 : 64;

diff --git a/tvm/vta/src/sim/sim_driver.cc b/tvm/vta/src/sim/sim_driver.cc
index 6064581…5c90ba9 100644
— a/tvm/vta/src/sim/sim_driver.cc
+++ b/tvm/vta/src/sim/sim_driver.cc
@@ -177,7 +177,7 @@ class DRAM {

private:
// The bits in page table
-static constexpr vta_phy_addr_t kPageBits = 12;
+static constexpr vta_phy_addr_t kPageBits = VTA_PAGE_BITS;
// page size, also the maximum allocable size 16 K
static constexpr vta_phy_addr_t kPageSize = 1 << kPageBits;
/*! \brief A page in the DRAM */

diff --git a/tvm/vta/src/runtime.cc b/tvm/vta/src/runtime.cc
index b2d5abf…e384ed4 100644
— a/tvm/vta/src/runtime.cc
+++ b/tvm/vta/src/runtime.cc
@@ -922,7 +922,9 @@ class CommandQueue {
insn->memory_type = dst_memory_type;
insn->sram_base = dst_sram_index;
DataBuffer* src = DataBuffer::FromHandle(src_dram_addr);
-insn->dram_base = src->phy_addr() / GetElemBytes(dst_memory_type) + src_elem_offset;
+uint32_t elem_bytes = GetElemBytes(dst_memory_type);
+CHECK_GE(VTA_PAGE_BYTES, elem_bytes);
+insn->dram_base = src->phy_addr() / elem_bytes + src_elem_offset;
insn->y_size = y_size;
insn->x_size = x_size;
insn->x_stride = x_stride;

@hijang, do you mind creating a PR for this bug fix? You can then tag me so I can review it. Thank you

@thierry, thanks for the reply, sure, I would do that.