Dear community members:
I had read throught the vta paper 1807.04188v3 and would like to ask a few questions about hardware customization for new FPGA.
----- quote -----
p.1: a microcode-ISA which implements a wide variety of operators with
single-cycle tensor-tensor operations.
p.5: Architectural knobs include
GEMM hardware intrinsic shape, data types, number of parallel
arithmetic units in the tensor ALU, ALU operations, BRAM
distribution between on-chip memories.
----- questions ----
-
Can I change the GEMM / ALU cores to be multi-cycle, in order to enable the pipeline inside the GEMM / ALU cores and increase the FMax?
-
It’s said the number of parallel ALU can be customized. How about the number of parallel GEMM cores? Suppose I have a very large FPGA with one million of LUT-6 andthousands of h/w MAC, how can I maximize the utilization of such resouces by implementing multiple parallel GEMM cores?
-
How can I change the source code to use new type of offchip memory ? e.g. DDR3 -> GDDR6 ?
How to specify the DDR controller and DDR memory characteristics such as bus throughput and latency? -
Shall I change the tsim code as well, in order to reflect the changes in new FPGA, so that the tsim simulation can accurately represent the actuall hardware performance and fuction ?
Thanks very much!
Kevin