utTVM standalone runtime

apps/bundle_bundle

where is this? mind providing a link?

@tgall_foo we need further effort to ensure the compliance. To ensure the compliance, we need to add MISRA checker in cppcheck as part of the sanity check in CI.

@aca88 See apps/bundle_deploy

1 Like

@liangfu is the static demo in apps/bundle_deploy truly ready for a bare metal system? Are there other ongoing efforts for the standalone uTVM?

@liangfu, have you ever measured the efficiency of crt_backend_api.c, crt_runtime_api.c and graph_runtime APIs in baremetal runtime implementation? If yes, what are the metrics?

I have not yet made them actually running on bare-metal, while I assume it only require newlib to run on bare-metal systems. The only metric I have is that the code size is significantly smaller (approx. 45 KiB, vs approx. 100 KiB for standalone uTVM runtime, vs approx. 500 KiB for default runtime) than other runtimes in TVM. This makes CRT ideal for systems with limited resources.

@liangfu, I changed target=“c” to dump out the fused operator code from a certain graph (the graph itself doesn’t matter) as below:


#ifdef __cplusplus
extern "C"
#endif
TVM_DLL int32_t fused_nn_dense_nn_bias_add_nn_relu(void* args, void* arg_type_ids, int32_t num_args, void* out_ret_value, void* out_ret_tcode) {
  void* arg0 = (((TVMValue*)args)[0].v_handle);
 ...

  void* compute = TVMBackendAllocWorkspace(1, dev_id, (uint64_t)1024, 2, 32);
  if (compute == NULL) {
    return -1;
  }
  for (int32_t y_outer_x_outer_fused = 0; y_outer_x_outer_fused < 16; ++y_outer_x_outer_fused) {
    void* compute1 = TVMBackendAllocWorkspace(1, dev_id, (uint64_t)1024, 2, 32);
    if (compute1 == NULL) {
      return -1;
    }
...
    if (TVMBackendFreeWorkspace(1, dev_id, compute1) != 0) {
      return -1;
    }
  }
 ...
  if (TVMBackendFreeWorkspace(1, dev_id, compute) != 0) {
    return -1;
  }

I noticed dynamic memory allocation/freeup in fused_op function call, which is very inefficient IMO. It is appreciated these can be pre-allocated before calling the fused_op. Or at least, the allocation/freeup operations can be pipelined with scheduling compute.

Dynamic memory allocation doesn’t actually happen in terms of CRT, see

it simply returns an address of a pre-allocated static memory.

Got it. What is the TLB for? Considering migrate standalone TVM deployment to support virtual memory?