utTVM standalone runtime

Hi,

From reading what I could find about micro TVM, I got the understanding that a standalone runtime is in the works but not available yet? Browsing the source code I saw a standalone directory and code exists under the uTVM tree.

How do I got about starting to use it?

Thanks :slight_smile:

Hi,

I think @weberlo is the right person to answer this question. I am also curious about the current efforts on the standalone uTVM?

Hi folks,

The runtime in the standalone directory was developed by @ajtulloch in this PR, so he would be the best person to ask. Keep in mind that his runtime is currently not integrated into the ”TVM infrastructure.

I personally haven’t tried using it yet, as I’ve been focusing on the AutoTVM-oriented (i.e., host-driven) runtime, where I’ve been experimenting with kernel optimizations. I plan to write a blog post about some of these optimizations in early March (I originally wanted to write the blog post in December, but I guess that’s Hofstadter’s law at work :sweat_smile:). After that, I plan on looking into standalone runtimes.

2 Likes

Hi @weberlo,

Regarding the host-driven variant of uTVM. Is this mode something that can be implemented on a device that has an ARM running linux and a programmable accelerator?.

BTW, I am looking forward for your blog! Keep us posted! :slight_smile:

Hi all,

I would like to express my interest towards the uTVM standalone runtime as well. Hereto a couple of questions to shed light on the purpose and current development status - I hope that @weberlo and @ajtulloch could help clairify these:

  1. Is the standalone runtime meant for (Linux-less) minimal bare-metal devices? Or is it rather a light-weight runtime implementation, still requiring e.g. threading or other more complex functionality?

  2. The standalone runtime is integrated into the master already, however it doesn’t seem to be complete yet. What is the current development status?

  3. How is the standalone runtime meant to be used? Could anyone please provide at least a minimal example or tutorial?

I’m looking forward to your answer!! Thank you in advance!

@robeastbme See my comment above for (2) and (3).

@tico @robeastbme The standalone runtime will be geared towards bare-metal, and the end goal is to be able to compile an entire model (control logic + operators) into a single binary blob (perhaps an archive file) that can be loaded into a system running an RTOS (e.g., Zephyr or Mbed).

1 Like

Aside from the standalone runtime, you probably need newlib to be compiled with your toolchain, in order to support the environment without operating systems like Linux. In terms of RTOS like Zephyr, it has already integrated newlib as its C library (see this).

2 Likes

Hi @weberlo , I am currently working on code that can be deployed onto the RISCV_spike using the uTVM. Is there any tutorial that is available for this? Also has the Python API for micro.Session changed? I got the C code out of relay but I am not able to pass it through the microTVM session. Please let me know if there are any documents regarding this that I can make use of.

Hey @cv_ruddha!

The API has indeed changed, and unfortunately, there is not a tutorial for how to use it yet. I can give some guidance here though.

For RISC-V Spike, you’ll want to use this function to generate a device configuration. The base_addr param is whatever memory address you’ve configured Spike’s address space to start at. Then server_addr and server_port are used to connect to the OpenOCD session you’ve attached to Spike.

You’ll likely want to configure the memory layout to something other than what’s given by the result of default_config, depending on your use case. You can achieve this by modifying the “mem_layout” field of the device config, since the config is just a dictionary. Admittedly, hand-tuning section start addresses and sizes isn’t very ergonomic, and I’d like to make some quality-of-life improvements in a future PR.

Once you have a suitable device config, you can look at the tests for example usage.

Hope this helps!

1 Like

@ajtulloch, you said below in your PR description:

“This is an alternative implementation of a subset of the TVM runtime API (and graph runtime) that focuses entirely on reducing code size, at the expense of functionality (no tvm.extern(
) calls via PackedFunc, CPU only, etc).”

Does this mean application running on bare-metal device has to be written in C/C++ to directly call the runtime APIs? Is Python still do-able on device? I’m new to TVM. Thanks.

If we enable Python support, the code size would definite grow. I think that is not the design principle in designing uTVM standalone runtime. In addition, Python itself relies on a lot of system calls from the operating system, making it not do-able on bare-metal.

@jinchenglee You might interested in the demo in <tvmroot>/apps/bundle_deploy.

Wonderful! Thanks for the example code in bundle_deploy, which is what I’m looking for now.

I’m trying to deploy to bare-metal device, say a risc-v cpu. I was looking for TVM to spit out c source code, but just realizing only gpu/opencl/
 are supported today for device side. Host side for LLVM IR only (not even C code). This fact leads me to uTVM.

I tried to build the uTVM standalone unit test at /tests/cpp/utvm_runtime_standalone_test.cc, which supports MacOS only for now. I built it on MacOS and only found out it failed with ‘Segmentation Fault’, :frowning:

I faced similar failure when I ran the unit test for uTVM standalone runtime. I’m trying to deprecate uTVM standalone runtime and replace it with the MISRA-C runtime, see #5060 .

Hi @liangfu, for me is not yet entire clear is the current uTVM standalone runtime is able to run on baremetal systems? My understanding is that a proper uTVM standalone runtime for baremetal is a WIP? Is the MISRA-C runtime aiming at being a uTVM standalone runtime for baremetal?

@weberlo, I tried to modify the uTVM demo python code from Colab. Sth. like below. USE_MICRO is turned on when building library.

def get_resnet():    
  block = get_model('resnet18_v1', pretrained=True)
  module, params = relay.frontend.from_mxnet(
    block, shape={'data': RESNET_INPUT_IMG_SHAPE})
  func = module['main']
  return func, params 
...
resnet, params = get_resnet()

with tvm.target.build_config(disable_vectorize=True):
  graph, resnet_c_mod, params = relay.build(resnet, 
                                   target='c',
                                   params=params)

device_config = tvm.micro.device.host.default_config()
with micro.Session(device_config):
  micro_mod = micro.create_micro_mod(resnet_c_mod, device_config) #<=Failed here
  ctx = tvm.micro_dev(0)
  module = graph_runtime.create(graph, micro_mod, ctx)

It failed as below.

MissionSession::LoadBinary() 
	 AllocateInSection(kText, 1080)
	 AllocateInSection(kRodata, 0)
	 AllocateInSection(kData, 0)
	 AllocateInSection(kBss, 88)
MissionSession::LoadBinary() 
	 AllocateInSection(kText, 174704)
Traceback (most recent call last):

  File "utvm.py", line 308, in <module>
    micro_mod = micro.create_micro_mod(resnet_c_mod, device_config)

  File "/work/git_repo/tvm/python/tvm/micro/base.py", line 162, in create_micro_mod
    micro_mod = tvm.runtime.load_module(lib_obj_path)

  File "/work/git_repo/tvm/python/tvm/runtime/module.py", line 404, in load_module
    return _ffi_api.ModuleLoadFromFile(path, fmt)

  File "/work/git_repo/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 213, in __call__
    raise get_last_ffi_error()

tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (7) /work/git_repo/tvm/build/libtvm.so(TVMFuncCall+0x65) [0x7f9de4bd4fd5]
  [bt] (6) /work/git_repo/tvm/build/libtvm.so(std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::TypedPackedFunc<tvm::runtime::Module (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>::AssignTypedLambda<tvm::runtime::Module (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>(tvm::runtime::Module (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&))::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0x86) [0x7f9de4bf1ec6]
  [bt] (5) /work/git_repo/tvm/build/libtvm.so(tvm::runtime::Module::LoadFromFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x4fe) [0x7f9de4bef77e]
  [bt] (4) /work/git_repo/tvm/build/libtvm.so(+0xc7b82f) [0x7f9de4c8482f]
  [bt] (3) /work/git_repo/tvm/build/libtvm.so(tvm::runtime::MicroSession::LoadBinary(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)+0x1a1) [0x7f9de4c89be1]
  [bt] (2) /work/git_repo/tvm/build/libtvm.so(tvm::runtime::MicroSession::AllocateInSection(tvm::runtime::SectionKind, unsigned long)+0x4d) [0x7f9de4c8998d]
  [bt] (1) /work/git_repo/tvm/build/libtvm.so(tvm::runtime::MicroSectionAllocator::Allocate(unsigned long)+0xe4) [0x7f9de4c90784]
  [bt] (0) /work/git_repo/tvm/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x43) [0x7f9de4398a93]
  File "/work/git_repo/tvm/src/runtime/micro/micro_section_allocator.h", line 64
TVMError: Check failed: size_ + size < capacity_: cannot alloc 174704 bytes in section with start_addr 0x7f9d8d71c000

Apparently, it tries to allocate for the resnet code, but exceed the capacity_ limit. I tried to enlarge the ‘text’ session size in dict specified by tvm.micro.device.host.default_config(). It doesn’t help – same failure but with a different number of bytes in ‘cannot alloc xxx bytes’ error message.

I wonder how can I change the setting for memory layout mapping appropriately, anyone? Thanks.

@tico With dependency on newlib, I think current uTVM standalone runtime could run on baremetal. I have not yet tried this myself.

Yes, MISRA-C runtime is aiming at running on critical systems (e.g. self-driving cars), which has similar target with uTVM standalone runtime. The WIP PR #5124 tries to enable MISRA-C runtime running on baremetal even without newlib.

@liangfu, it seems current deploy_bundle runtime still relies standard c++ library, such as dlopen() calls, which means a Linux OS running on device.

How are you going to address that in your “bare-metal” solution? Thanks.

@jinchenglee Yeah. This is very annoying with the current version of ”TVM. You’ll need to keep making the text section larger until the error stops. There are some changes I’m hoping to mainline soon that will at least replace the “cannot alloc 174704 bytes in section with start_addr 0x7f9d8d71c000” message with "“cannot alloc 174704 bytes in text section”.

Eventually, it’d be nice if users didn’t need to think about memory layouts unless they wanted to. This would require some larger architectural changes, because we would need to delay the choice of a memory layout until the user decides they have all of the necessary modules.

@jinchenglee You might be interested in the static demo in apps/bundle_bundle. The CRT based static demo no longer depend on C++ libraries, and it doesn’t make any function call in the dl library.

Bit of a n00b type question, for the stand alone runtime, a key attribute is MISRA compliance. Is the idea that it would remain compliant? If so how will be that achieved? I’m not familiar with how that is validated. Anyone know?