Implementing new operators for TensorFlow


#1

Hello,

After spending some time in the forums there seems to be some interest in implementing unsupported operators in TensorFlow. However, I have been unable to find a concise description on how to do that. My current understanding is as follows:

  1. The operations are defined in the TensorFlow front end (python/tvm/relay/frontend/tensorflow.py) lines 1315-1425 based on a set of existing primitives. These define the Graph-level front end.

  2. If the operation can not be expressed through the existing primitives then new code is added in that file. This is covered in the Adding an Operator to Relay document. Is this correct?

  3. Finally, the new operator may also need to be defined in the TOPI (topi/python/topi) to describe the schedule. I assume that the schedule is defined for each HW target.

    Since I need to implement quite a few operators I volunteer to document the entire process so we can add a Tutorial on this topic.

Any help is really appreciated.

Thanks!


#2

Hi,

Awesome!, great that you are planning to help improving the documentation of the entire process of adding new operators. Currently is very challenging to start in the art of adding new operators to TVM and the current documentation is quite limited in these sense. Therefore, is quite hard to contribute by adding new operators since there is no clear way on how to do this and it seems the steps are inconsistent from one operator to another. I have seem other people expressing this concern as well.

As you mentioned, my understating is also that there are three main areas when it comes to new operators: Frontends, Relay and TOPI. I am aware that there is some documentation on Relay and TOPI but process as a whole is not documented. Moreover, this documentation should be general enough beyond only the Tensorflow frontend. In general, the TVM community should put more effort in keep doing good documentation, since there are other areas that are not yet documented like the recently merged uTVM, which is a major new feature.

@yongwww @tqchen @thierry Any thoughts or comments on this?


#3

Thank you @alopez_13 for volunteering to help improve our documentation. I totally agree with you and @tico that the learning curve can be challenging; documentation will definitely help!

One recommendation is to look at previous PRs: this will be a good starting point on how to add new operators. Here’s an example of a PR that was just merged recently: https://github.com/dmlc/tvm/pull/3614/files which might fulfill (1), (2).

For (3) this will require writing an operator schedule in TOPI; here’s an example https://github.com/dmlc/tvm/commit/84590063225e71eb12e90cf625aa54c3d790c620#diff-29432e8bfe4b3f9297dd74c6efc802c5


#4

I hope this gives you a starting point!


#5

Thanks @thierry and @tico I´ll go over those examples and will document the process. I’ll keep posting in this thread. Once the documentation is good enough we can put it somewhere else.


#6

Here is my initial draft of the procedure:


This document lists the sequence of steps needed to add new operators to TVM using the TensorFlow front end. Other front ends should be similar, but the focus is on TensorFlow.

TVM takes as input a TensorFlow model and produces a computational graph that is optimized before generating an intermediate representation (IR). The IR is also optimized before its passed to the backend (for example LLVM). The proposed sequence for adding new operators is as follows:

  1. The computational graph operations are defined in the TensorFlow front end (python/tvm/relay/frontend/tensorflow.py) lines 1315-1425 based on a set of existing primitives. It may be possible to express the new operator as a combination of these primitives.

  2. If the operation can not be expressed through the existing primitives then the new operator needs to be declared in: include/tvm/expr_operator.h

    • Depending on the type of operator the definition is added into: python/tvm/intrin.py

    • If the operator works on a special data, the new datatype it should be added to: python/tvm/datatype.py

    • The new operator is added the TensorFlow front end: python/tvm/relay/frontend/tensorflow.py lines 1315-1425

  3. The operator needs to be registered into Relay through the following files:

    • python/tvm/relay/op/_tensor.py // Need to register the schedule
    • python/tvm/relay/op/_tensor_grad.py // Only for some operators
    • python/tvm/relay/op/tensor.py // Wrapper function definition
  4. The operator is then added to the code generation phase of TVM, note that there are many backends and the operator may need to be declared in more than one backend:

    • src/codegen/intrin_rule.cc // Default intrinsic rules for the operator
    • src/codegen/intrin_rule_cuda.cc // CUDA specific rules
    • src/codegen/llvm/intrin_rule_llvm.cc // LLVM specific rules
    • Then depending on the type of operator the code that implements the operation is in:
      • src/relay/op/tensor/unary.cc
      • src/relay/op/tensor/binary.cc
      • src/relay/op/tensor/reduce.cc
      • src/relay/op/tensor/transform.cc
  5. The operator is also added into the test framework:

    • tests/python/frontend/tensorflow/test_forward.p
    • tests/python/frontend/tensorflow/test_control_flow.py // Only for control flow operators
    • tests/python/relay/test_op_grad_level1.py
    • tests/python/relay/test_op_level1.py
  6. Finally, the new operator is added in the TOPI:

  • First the operator is added to the definitions according to its type:
    • topi/include/topi/elemwise.h
    • topi/include/topi/reduction.h
    • topi/include/topi/transform.h
  • The code of the operator is defined depending to its type
    • topi/python/topi/math.py
    • topi/python/topi/reduction.py
    • topi/python/topi/transform.py
  • The operator is registered into the TOPI:
  • To complete the procedure the instruction is added to the TOPI test framework:
    • topi/tests/python/test_topi_basic.py
    • topi/tests/python/test_topi_math.py // Depending on the operator, other options possible

Sorry if the formatting is not as clear, I had some issues with the posting tool.


#7

Tangentially related to the above post I have some general observations about new operators:

  1. Some of the unsupported operators that I have found are, for lack of a better term, “support” operators. For example, ‘IteratorV2’, ‘IteratorGetNext’, ‘SaveV2’, ‘RestoreV2’, 'Assign’, and ‘Assert’. I know that those operators can be avoided my changing the model, but in some cases we are just given a pb file or checkpoint without the original code, and doing “surgery” on the graph may become problematic.

  2. There are some operators that seem to be extensions from existing ones and thus easier to add. Some examples: ‘TensorArrayGatherV3’, ‘TensorArrayReadV3’, ‘NonMaxSuppressionV3’, ‘TensorArrayV3’, ‘TensorArrayScatterV3’, ‘Where’, ‘TensorArrayWriteV3’, 'TensorArraySizeV3’.

  3. There are new operators that perform some computation that do require support and are deemed “essential”.

I don’t know if there is a forum for discussing what should be considered an “essential” operator vs a “support” one. Personally, I don’t see a real need to optimize code for “support” operators, but their presence in the models need to be addressed somewhere in the compiler flow. Any thoughts on this?


#8

@alopez_13 thanks for the nice initial draft!. I hope other experience members can also contribute to describe each of those steps in more detail. What is difficult about this is not only the amount of files that have to be modified but also that from one operator to another the are differences in the process that is not easy to get.

BTW, Should all operators need to be added to TOPI or only some depending on the type?