[Training] Relay vs NNVM

I read in this topic:

that TVM NNVM supports training in some way. Does Relay also support training?

It is in developement.

1 Like

Thank you for the answer.
Could you elaborate more on that? Are there any features in tvm’s repository that I could use to try train a simple model using relay?

There is an reverse mode automatic gradient algorithm sitting. I am developing a full training framework for tvm/relay that manage parameters/parameters initialization/pytorch interop that a single algorithm cant do.

Recently, I tried to add a simple training framework on relay (it is enough to train a simple CNN network, not a generic training framework). My idea is to obtain and save the input,output and parameters of each hidden layer from the inference process of the network, and then conduct gradient descent on these parameters, so as to achieve the purpose of updating parameters. However, due to my lack of experience, I have not read the source code of other training frameworks. The inference framework of TVM used computational graph to represents the input layer operators. My question is: is it necessary to build a computational graph in the implementation of the training framework? Thank you!

You need to find some ways to massage the weight of the neural network to function parameter, which will then be recognize by the ad algorithm. There are possibly many approaches to this.

High level framework such as TensorFlow build a graph which contain the AD subGraph, why you implement the AD algorithm in Relay? Do you mean do AD in Relay level, someone can define an infer network using Relay and generate train-part automatically?

The difference between train-Graph and infer-Graph in TensorFlow is the train-Graph has the stateful Op (VariableV2) and related Op (ApplyXXX, Assign, Save/Restore, …) and infer-Graph don’t. If Relay can represent these primitives, TVM works in train-mode.

Any other blocking item TVM support train-mode? I am rookie in TVM. :grinning:

given a relay network, it is indeed possible to use the AD algorithm to generate the reverse mode. However, relay do not has a notion of global variable - it has reference, but they are not persistent across multiple execution. (It has no static variable that you can mutate, in C++ term.) one have to manage the weight of an NN if they want to train in tvm as of today. It is possible and people (well, including me) has get some success. I am experimenting with adding nn parameters into Relay, with it interacting nicely with other stuff like AD/passes/codegen.

1 Like

I see, thanks.

Can you show me your detailed design or code? I don’t know which level you want to implement. A new IR Node? @MarisaKirisame

Unfortunately, I am busy and do not have anything to show just yet.