[RFC] support cce target name in tvm


#1

Hi,

We want to add cce target in tvm, it would take time for open souring the whole cce backend, can we PR cce target related first? we need to add a device_type in dlpack, also add target name and device_type in c_runtime_api.cc build_module.cc runtime_ctypes.py.

Thanks


#2

please provide a bit more background as not everyone in the community know what the cce is


#3

Sure, just similar to Cuda C, cce C is a programming language for Huawei’s AI chip, Davinci IP core.

We have supported cce backend based on TVM with the help of community, thanks goes to community members.

Ascend AI IP and chip series(with unified Davinci core inside) has been released today at HC 2018, you can check it out here. https://www.huawei.com/en/press-events/news/2018/10/huawei-hc-2018-eric-xu-ai


How to retarget TVM to a new ASIC chip as a device code generator?
#4


@tqchen I send a PR to dlpack


#5

Given the current status of CCE support, maybe it makes sense to bring kDLCCE to tvm repo first, with some background info, once we have some running examples, then we upstream the change to DLPack


#6

bring kDLCCE to tvm repo first —> tvm are using dlpack as sub module, how can I do it?


#7

add the flag definition to https://github.com/dmlc/tvm/blob/master/include/tvm/runtime/c_runtime_api.h#L63


#8

great! so I close PR on dlpack, and send another one on TVM


#9

And please also provide background information, hopefully a rough timeline on when can the community start to use CCE backend :slight_smile:


#10

It’s great to enable the programming model for the new AI chip. I think the community would take time to get familiar with the new era of ASIC based accelerators. @xqdan can you provide more information on the following details regarding to CCE C programming and the DaVinci chip?

  • CCE C Programming (Technical Specification, Programming Syntax, Programming Interface, Optimization Guidelines)
  • DaVinci chip (Computational Capacity, Availability of Development Boards)

Without those specs, it wouldn’t be friendly for the community to accept the new AI chip. There are already many AI chips that don’t have any developer friendly programming interface.


#11

@liangfu thanks for your attention, actually what we’ve been doing on TVM is trying to reduce developers‘ burden of learning these detailed low level information. Imagine that you just write the tvm dsl and no need to take care of the things you mentioned above.


#12

@xqdan One thing that might be nice is to understand the set of hardware intrinsics that TVM should lower a schedule down to (for instance are we using tensorization intrinsics, or different types of DMA load/stores). In addition, it might be good to understand how a programmer can expose more parallelism for the chip to take advantage of. For instance with the VTA reference design we used virtual threads that would be lowered to low-level dataflow-like synchronization operations to uncover task-level parallelism within the chip.

Highlighting these challenges when targeting the DaVinci chip would be nice, and perhaps contrasting it with VTA so that programmers can understand how it relates in terms of challenges.

Overall this is very exciting stuff!