Question about RPC tutorial - how to configure LLVM on local machine


Hi all, I have a question regarding the RPC tutorial. I’ve got a TVM runtime built and running on my Hexagon DSP target, but I’m unclear on how to build and configure LLVM/TVM for the host machine in order to cross-compile kernels.

Am I supposed to build TVM using my target’s cross compiler or the host (x86) compiler? When I build with the former I get ELF errors when importing TVM, and I get ‘target not enabled’ errors when I do the latter.

For reference, I’m part of the team at Qualcomm working on the Hexagon backend, and my target is specified as ‘hexagon-unknown-elf -mcpu=hexagonv60 -mattr=+hvx,+hvx-length64b -mattr=-small-data’ when I run TVM entirely on the target. I’m trying to run the Cross Compilation and RPC tutorial found here:



You should build TVM on the host for the host compiler. I do not know the status of support for the Hexagon DSP, which could be related to why you see the ‘target not enabled’ error. If the DSP is supported by LLVM (which it looks like it is based on the way you described your target specifications), did you compile TVM on the host with the LLVM backend enabled?


This can be viewed as a followup of Offloading subgraphs to Hexagon

You will likely still need a Hexagon runtime first, which support offload the generated functions(the elfs, I am not sure what is the current progress on that). Once you build that, then you can build tvm on your x86 machine and use cross compilation to get the elf. Make sure that you use the LLVM that with Hexagon support enabled.


Hi, thanks for the replies. Yes, I am building TVM with an LLVM that has an enabled Hexagon backend, but only a Hexagon backend, nothing else.

Should I be building an LLVM toolchain with both X86 and Hexagon backends enabled, and using this toolchain to build TVM for the host machine?


If you only need to generate hexagon code, likely you only need to use LLVM with Hexagon enabled. It might be helpful to also enable x86 in case TVM local x86 codegen for correctness verification, but as long as you do not do so, you only need Hexagon backend


I did try building TVM using the Hexagon cross-compiler, but when I attempt to run the RPC example I get errors when python attempts to bind with the TVM APIs because they are built for Hexagon, not X86.

After that, I was assuming the TVM on the host needed to be able to run on X86 but be able to generate Hexagon code. When using Autotune and RPC, does TVM on the host generate actual machine code or just some form of IR that gets passed down to the target, with the expectation that the TVM runtime on the target will take care of lowering it to machine code?


TVM uses cross compiler to generate the machine code. But again, things may not work directly out of box, if there is no hexagon runtime available yet.

In particular, we will need a HexgonDeviceAPI and HexagonModule as discussed in Offloading subgraphs to Hexagon. Make sure that you can run get hexagon device type in your board and is able to run the program correctly.

We also need to register the correct binary loader(currently it detects by suffix, so if we save things as .so, then it will be assumed as a host module, instead of hexagon one). Let us assume we registered the suffix “xyz.hexagon_lib” and we are able to load it locally, then the RPC will work because
remote.load_module(“xyz.hexgon_lib”) will call into the local Module::Load


Hi Tianqi, thank you for the reply. I did read through the post you referenced before starting a new topic, but my issue is slightly different. In my case, I’m targeting a full Linux distro ported to Hexagon to enable auto-tune, whereas Krzysztof is targeting the much more minimal runtime environment that is currently productized. I am leveraging much of the work Krzysztof has done.

I was able to get past my initial problem by building a custom version of LLVM with X86 and Hexagon backends enabled, and X86 as the default target. I used this LLVM to configure and build TVM on the host, and I can now successfully (I think) build TVM and use it to compile a kernel for offload via RPC to Hexagon.

I have a different problem now, related to the RPC connection. I’m going to spend some time debugging, but I will start a separate thread if I can’t make progress. Thanks again for your help.

Problem uploading kernel shared object to remote device via RPC

I see, my original thought was to actually use the runtime env for autotune, in which case the RPC server starts at the board(possibly ARM) and it “drives the DSP” through device module like GPU did.

Enabling a linux stack on the DSP itself certainly will also work as well.