utTVM standalone runtime

aca88 · April 14, 2020, 9:03am

apps/bundle_bundle

where is this? mind providing a link?

liangfu · April 14, 2020, 10:52am

@tgall_foo we need further effort to ensure the compliance. To ensure the compliance, we need to add MISRA checker in cppcheck as part of the sanity check in CI.

@aca88 See apps/bundle_deploy

tico · April 17, 2020, 1:27pm

@liangfu is the static demo in apps/bundle_deploy truly ready for a bare metal system? Are there other ongoing efforts for the standalone uTVM?

jinchenglee · April 30, 2020, 7:00am

@liangfu, have you ever measured the efficiency of crt_backend_api.c, crt_runtime_api.c and graph_runtime APIs in baremetal runtime implementation? If yes, what are the metrics?

liangfu · April 30, 2020, 7:19am

I have not yet made them actually running on bare-metal, while I assume it only require newlib to run on bare-metal systems. The only metric I have is that the code size is significantly smaller (approx. 45 KiB, vs approx. 100 KiB for standalone uTVM runtime, vs approx. 500 KiB for default runtime) than other runtimes in TVM. This makes CRT ideal for systems with limited resources.

jinchenglee · May 8, 2020, 5:32am

@liangfu, I changed target=“c” to dump out the fused operator code from a certain graph (the graph itself doesn’t matter) as below:


#ifdef __cplusplus
extern "C"
#endif
TVM_DLL int32_t fused_nn_dense_nn_bias_add_nn_relu(void* args, void* arg_type_ids, int32_t num_args, void* out_ret_value, void* out_ret_tcode) {
  void* arg0 = (((TVMValue*)args)[0].v_handle);
 ...

  void* compute = TVMBackendAllocWorkspace(1, dev_id, (uint64_t)1024, 2, 32);
  if (compute == NULL) {
    return -1;
  }
  for (int32_t y_outer_x_outer_fused = 0; y_outer_x_outer_fused < 16; ++y_outer_x_outer_fused) {
    void* compute1 = TVMBackendAllocWorkspace(1, dev_id, (uint64_t)1024, 2, 32);
    if (compute1 == NULL) {
      return -1;
    }
...
    if (TVMBackendFreeWorkspace(1, dev_id, compute1) != 0) {
      return -1;
    }
  }
 ...
  if (TVMBackendFreeWorkspace(1, dev_id, compute) != 0) {
    return -1;
  }

I noticed dynamic memory allocation/freeup in fused_op function call, which is very inefficient IMO. It is appreciated these can be pre-allocated before calling the fused_op. Or at least, the allocation/freeup operations can be pipelined with scheduling compute.

liangfu · May 8, 2020, 7:25am

Dynamic memory allocation doesn’t actually happen in terms of CRT, see

github.com

apache/incubator-tvm/blob/master/src/runtime/crt/memory.c#L52-L55


/**
 * \brief Memory pool for virtual dynamic memory allocation
 */
static char g_memory_pool[TVM_CRT_VIRT_MEM_SIZE];

it simply returns an address of a pre-allocated static memory.

jinchenglee · May 8, 2020, 10:59pm

Got it. What is the TLB for? Considering migrate standalone TVM deployment to support virtual memory?