How to effectively retarget/port TVM to new AI accelerators?

I think BYOC assumes your new AI accelerator supports programming in C/C++. But if the new AI accelerator doesn’t support such feature, I think these steps might be necessary:

  1. set target=ext_dev and -device=new_AI_accelerator_name
  2. optionally, quantize the input model into a target precision that the new accelerator supports, e.g. int8, bfloat16
  3. define instruction layout (load, compute, store etc.)
  4. implement host-device memory interface (e.g. dma_copy: dram->sram, sram->dram)
  5. implement runtime and driver for your target device, in order to properly handle instruction execution sequence and dependency

Hi @comaniac, Thanks for the pointer to the RFC! I will go through the discussion :slight_smile:

Hi @liangfu,

Thanks for sharing the steps. Actually, I have been considering both programmable (C/C++) and non-programmable AI accelerators. Then how non-programmable AI accelerators can be supported in TVM is a very interesting and useful feature in TVM.

However, I though BYOC could already be used to target non-programmable AI accelerators as long as there is a library that allow to access them from C/C++?

That’s correct, hence, the library should at least contain runtime and driver for the accelerator.

@comaniac you mentioned in [RFC] Op based annotation for external codegen the following:

“Currently, we expect users to write a custom pass to annotate a Relay program and then send it for partitioning.”

Can you please point me to some example of this annotation pass in which I can tell the ops that should by implemented using the codegen from BYOC?

@mbaret mentioned above that the annotation mechanism is not yet there and currently is a painful manual process. So I am bit confused about what can I do at this point to annotate my Relay IR to use BYOC.

Thanks

@tico See the test cases in my PR https://github.com/apache/incubator-tvm/pull/4741 This is exactly what “custom annotation pass” + partitioning is about. It also demonstrates the “painful” process you mentioned.

The good thing is that anything is possible if you go the hard way :slight_smile: But the composite + composite aware annotation should make my PR way simpler.

2 Likes

@masahi thanks for the pointer. This is was I was looking to try the external codegen to offload ops to an accelerator until the op based annotation is ready, which hopefully makes this process easier.

@comaniac @liangfu I manage to create an external codegen and annotate the relay IR of my model using compiler_begin and compiler_end to offload specific ops.

However, I was wondering if there is any mechanism already in place to avoid expensive memcpy’s between TVM and the external library. In my case, on the platform I have a particular way to allocate memory so that is shared between the host and the ai accelerator. One solution is that I could potentially use that allocator in TVM for all tensors and in that way I wont need memcpy but maybe there is a better solution in the context of BYOC

@tico The ir_pass.py script in VTA might be helpful to you. It transforms the IR to the dma_copy instructions to achieve similar goal.