Hi @vegaluis,
I just run test_vta_insn.py and saw performance number which count by cycle, for example 8579 clock from GEMM.
one question is , let’s say in the Task level pipeline parallelism scenario, Load module and Compute module is parallel running , store module running after compute finish, if we assume
load module spend 4000 clock , compute module spend 4500 clocks, and store module spend 79 clocks , is the overall cycle spend, actually is assume all three module is serialize running and just summarize their total cycle?
or TSIM can handle the parallel scenario, count the synchronize time cost , accuracy figure out the module waiting/running even after the simulator thread get swap out by process scheduler, could you give some detail information about how TSIM make the performance time is accurate in parallel module scenario?
Regards
Hua