Implementing new operators for TensorFlow

Hello,

After spending some time in the forums there seems to be some interest in implementing unsupported operators in TensorFlow. However, I have been unable to find a concise description on how to do that. My current understanding is as follows:

  1. The operations are defined in the TensorFlow front end (python/tvm/relay/frontend/tensorflow.py) lines 1315-1425 based on a set of existing primitives. These define the Graph-level front end.

  2. If the operation can not be expressed through the existing primitives then new code is added in that file. This is covered in the Adding an Operator to Relay document. Is this correct?

  3. Finally, the new operator may also need to be defined in the TOPI (topi/python/topi) to describe the schedule. I assume that the schedule is defined for each HW target.

    Since I need to implement quite a few operators I volunteer to document the entire process so we can add a Tutorial on this topic.

Any help is really appreciated.

Thanks!

2 Likes

Hi,

Awesome!, great that you are planning to help improving the documentation of the entire process of adding new operators. Currently is very challenging to start in the art of adding new operators to TVM and the current documentation is quite limited in these sense. Therefore, is quite hard to contribute by adding new operators since there is no clear way on how to do this and it seems the steps are inconsistent from one operator to another. I have seem other people expressing this concern as well.

As you mentioned, my understating is also that there are three main areas when it comes to new operators: Frontends, Relay and TOPI. I am aware that there is some documentation on Relay and TOPI but process as a whole is not documented. Moreover, this documentation should be general enough beyond only the Tensorflow frontend. In general, the TVM community should put more effort in keep doing good documentation, since there are other areas that are not yet documented like the recently merged uTVM, which is a major new feature.

@yongwww @tqchen @thierry Any thoughts or comments on this?

Thank you @alopez_13 for volunteering to help improve our documentation. I totally agree with you and @tico that the learning curve can be challenging; documentation will definitely help!

One recommendation is to look at previous PRs: this will be a good starting point on how to add new operators. Here’s an example of a PR that was just merged recently: https://github.com/dmlc/tvm/pull/3614/files which might fulfill (1), (2).

For (3) this will require writing an operator schedule in TOPI; here’s an example https://github.com/dmlc/tvm/commit/84590063225e71eb12e90cf625aa54c3d790c620#diff-29432e8bfe4b3f9297dd74c6efc802c5

1 Like

I hope this gives you a starting point!

Thanks @thierry and @tico I´ll go over those examples and will document the process. I’ll keep posting in this thread. Once the documentation is good enough we can put it somewhere else.

1 Like

Here is my initial draft of the procedure:


This document lists the sequence of steps needed to add new operators to TVM using the TensorFlow front end. Other front ends should be similar, but the focus is on TensorFlow.

TVM takes as input a TensorFlow model and produces a computational graph that is optimized before generating an intermediate representation (IR). The IR is also optimized before its passed to the backend (for example LLVM). The proposed sequence for adding new operators is as follows:

  1. The computational graph operations are defined in the TensorFlow front end (python/tvm/relay/frontend/tensorflow.py) lines 1315-1425 based on a set of existing primitives. It may be possible to express the new operator as a combination of these primitives.

  2. If the operation can not be expressed through the existing primitives then the new operator needs to be declared in: include/tvm/expr_operator.h

    • Depending on the type of operator the definition is added into: python/tvm/intrin.py

    • If the operator works on a special data, the new datatype it should be added to: python/tvm/datatype.py

    • The new operator is added the TensorFlow front end: python/tvm/relay/frontend/tensorflow.py lines 1315-1425

  3. The operator needs to be registered into Relay through the following files:

    • python/tvm/relay/op/_tensor.py // Need to register the schedule
    • python/tvm/relay/op/_tensor_grad.py // Only for some operators
    • python/tvm/relay/op/tensor.py // Wrapper function definition
  4. The operator is then added to the code generation phase of TVM, note that there are many backends and the operator may need to be declared in more than one backend:

    • src/codegen/intrin_rule.cc // Default intrinsic rules for the operator
    • src/codegen/intrin_rule_cuda.cc // CUDA specific rules
    • src/codegen/llvm/intrin_rule_llvm.cc // LLVM specific rules
    • Then depending on the type of operator the code that implements the operation is in:
      • src/relay/op/tensor/unary.cc
      • src/relay/op/tensor/binary.cc
      • src/relay/op/tensor/reduce.cc
      • src/relay/op/tensor/transform.cc
  5. The operator is also added into the test framework:

    • tests/python/frontend/tensorflow/test_forward.p
    • tests/python/frontend/tensorflow/test_control_flow.py // Only for control flow operators
    • tests/python/relay/test_op_grad_level1.py
    • tests/python/relay/test_op_level1.py
  6. Finally, the new operator is added in the TOPI:

  • First the operator is added to the definitions according to its type:
    • topi/include/topi/elemwise.h
    • topi/include/topi/reduction.h
    • topi/include/topi/transform.h
  • The code of the operator is defined depending to its type
    • topi/python/topi/math.py
    • topi/python/topi/reduction.py
    • topi/python/topi/transform.py
  • The operator is registered into the TOPI:
  • To complete the procedure the instruction is added to the TOPI test framework:
    • topi/tests/python/test_topi_basic.py
    • topi/tests/python/test_topi_math.py // Depending on the operator, other options possible

Sorry if the formatting is not as clear, I had some issues with the posting tool.

4 Likes

Tangentially related to the above post I have some general observations about new operators:

  1. Some of the unsupported operators that I have found are, for lack of a better term, “support” operators. For example, ‘IteratorV2’, ‘IteratorGetNext’, ‘SaveV2’, ‘RestoreV2’, 'Assign’, and ‘Assert’. I know that those operators can be avoided my changing the model, but in some cases we are just given a pb file or checkpoint without the original code, and doing “surgery” on the graph may become problematic.

  2. There are some operators that seem to be extensions from existing ones and thus easier to add. Some examples: ‘TensorArrayGatherV3’, ‘TensorArrayReadV3’, ‘NonMaxSuppressionV3’, ‘TensorArrayV3’, ‘TensorArrayScatterV3’, ‘Where’, ‘TensorArrayWriteV3’, 'TensorArraySizeV3’.

  3. There are new operators that perform some computation that do require support and are deemed “essential”.

I don’t know if there is a forum for discussing what should be considered an “essential” operator vs a “support” one. Personally, I don’t see a real need to optimize code for “support” operators, but their presence in the models need to be addressed somewhere in the compiler flow. Any thoughts on this?

@alopez_13 thanks for the nice initial draft!. I hope other experience members can also contribute to describe each of those steps in more detail. What is difficult about this is not only the amount of files that have to be modified but also that from one operator to another the are differences in the process that is not easy to get.

BTW, Should all operators need to be added to TOPI or only some depending on the type?

Hi All, I am also new to TVM and trying to add support for some operators in TVM. Here is my understanding in addition to what is already documented by @alopez_13. I will try to keep it short for now: For any operator that you want to support in TVM and corresponding Frontend DL framework, there are three layers where we have to make changes:

  1. Relay layer
  2. Topi layer
  3. Front-end layer(tensorflow, keras etc.) Relay layer - Explained already but crux is that here you describe the semantics of operator, like its inputs, its attributes(remember number of inputs need not necessarily be the same as number of atrributes), relation between the input types etc.

Relay layer is organized in two parts: Python and C++. The python hooks call the C++ counterpart. Sometimes or for some operators(many old existing operators if i can say) the actual operation implementation is completely present in Relay. It might be done in Python itself sometimes, but mostly for performance reason you would most of the time find the implementation in C++. Relay layer is also used to register the operator and describe it and associate different information about the operator. For eg. attributes relation, compute function declaration etc. Mostly from the associated compute function(in C++), we called the topi C++ layer.

Topi layer - Here you register the operator in TOPI, the TVM operator inventory. Topi layer is again organized both in Python and C++ layer. But the relay layer(C++) would always call corresponding topi layer implementation for the actual compute. The topi python layer also internally calls the topi C++ layer for actual operator implementation code.

Front-end layer - For every Front-end framework supported in TVM, you will find a corresponding file in tvm which is already brilliantly explained by @alopez_13. But still let me talk briefly about it. We write our binding/conversion (DL Frontend operator to tvm operator) in respective front-end file(eg. python/tvm/relay/frontend/tensorflow.py). Now we write corresponding test cases to compare our TVM operator performance/accuracy with DL Frontend operator in test_forward.py for that Frontend(eg. tests/python/frontend/tensorflow/test_forward.py). Also topi and relay layers have their own testing related files similar to Front-end testing.

Hope this is little useful. Sorry I did not care much to format my documentation, but please feel free to ask any doubts/questions you may have.

Thank you. Regards, Deepak

Hello,

As you said 'For example, ‘IteratorV2’, ‘IteratorGetNext’, ‘SaveV2’, ‘RestoreV2’, 'Assign’, and ‘Assert’. I know that those operators can be avoided my changing the model ’

I am training a NCF model by using TensorFlow model ncf code, and I get these two ‘IteratorV2’, ‘IteratorGetNext’ in the final frozen.pb file.

How could I remove these two operators?

Most of the operators I mentioned are removed when you freeze the graph. For the IteratorV2 and IteratorGetNext I think those are preprocessing steps that you can move out of the model and add them back when you do inference. Look for loops that feed data in or do some pre-processing.

Hope that helps.

@alopez_13 Hi, I am currently working on implementing new operators for PyTorch/ TensorFlow. I have a custom implementation of a softmax layer with me for which I need to implement the relay operator and eventually, make TVM recognize it and compile it. I found your initial draft to be extremely useful, but I’m afraid there have been changes in the location of the scripts/ the paths to the APIs over the past three years. For example, I do not see a file called expr_operator.h under the include/tvm anymore. I think the paths and the filenames even, have been changed.

Can you please provide an update on the paths to the files where the implementations for our custom layers/nodes must be written? Also, if the implementation procedure itself has undergone a few changes, can someone please mention them here? It would greatly accelerate the process of finding out the files/ scripts one by one. I have been having a hard time locating these so far. TIA

Regards, Krishna

Hello!

Before jumping into the actual files I would recommend going over the new documentation on how to add new operators and then I would suggest looking at some of the unit tests (/tvm/tests/python/) since you may be able to implement the operator based on existing ones.

Hope it helps…

1 Like

@alopez_13 Thank you for the reply!

I took a look at the unittests and saw there were a lot of tests performed along with scheduling. I could not understand what was being done there. I read the documentation you had linked it, but the idea still remains vague to me. Can you please tell me what I should exactly be looking for in the unittest directory? Thanks.

Its just that the documentation suggests how you could test the new operator ( tests/python/relay/test_op_level3.py) I usually look at the examples in those tests to better understand how TVM works in general.

Hope this helps.