[RFC] Bring in Tensor Expression Autodiff

yzhliu · March 16, 2020, 6:48am

@sgrechanik-h Had done a great job on implementing Autodiff on top of tensor expression. However, we didn’t manage to merge the work as there were unfinished discussion on the infrastructure and code organizing. As TVM code base has been evolved and refactored a lot since the time, this RFC is to discuss whether and how to have the feature merged into current repository.

Previous RFC: Tensor Expression level automatic differentiation, Automatic differentiation on the level of TVM IR

Previous Pull Request: Autodiff, Zero Elimination

I will briefly describe the motivation, the approach it adopted, user APIs and how it can coordinate with Integer Arithmetic Infra, also summarize some of the concerns.

Motivation and Scope

Hopefully we can reduce the effort to manually implement gradient compute for each operator. It works for operators written in tvm.compute, hybrid script would not be able to benefit.

Approach

I try to briefly describe the approach so ppl can better understand the code organizing we would like to propose.

It applies reverse-mode autodiff to Tensor.op.body, i.e.,

adjoint(x) = \sum_{d is consumer of x} adjoint(y_d) * Jacobian(y_d, x).

The simplest implement will result large intermediate tensor: Jacobian(y[M], x[N]) generates a tensor of MxN. However, for most operators, y[i] only depends on sub-vector of x. For element-wise operators such as relu, y[i] merely depends on x[i], implies only the diagonal of Jacobian[MxN] is non-zero.

Thus zero elimination pass is applied afterwards to eliminate the unnecessary intermediate tensor allocation. The method described in the paper handles general scenario that, the non-zero index of Jacobian is a linear combination of index of x:

A a = b, in which vector a is the (multi-dim-)index of non-zero Jacobian, vector b is the (multi-dim-)index of input x.

a can either be represented directly by b (a = A^{-1} b if A is full-rank), or there exists some free variables in the solution other than b, (A is not full-rank), we name them as z, len(b) + len(z) should be much smaller than MxN.

We also need to deduce the boundary of z.

With that we would be able to create much smaller intermediate tensor (Jacobian) by using these deduced variables and their boundaries. We also have chance to reduce # of for-loops.

User APIs and Code Organizing

Autodiff

Create tvm.te.autodiff and implement naive reverse-mode autodiff under the namespace. We have two APIs tvm.te.Jacobian(f_d, x), tvm.te.gradient(output, x) (or tvm.te.differentiate) to calculate Jacobian and adjoint (aka gradient wrt to output) respectively.

Optimization (zero-elimination)

LinearSystem data structure to describe linear equations and inequalities. It contains relations (=, <, >, etc.), variables and their ranges. We implement the data structures in include/tvm/arith/linear_system.
LinearEquationSolver to solve A x = b. This is essentially SVD decomposition. We can have it in src/arith/linear_solver.cc
LinearInequalitySolver to deduce boundary for free variables z in solutions of the linear equation above. We can implement it in src/arith/linear_solver.cc
Other helper functions and transformations, e.g., re-construct Tensor Compute from new expressions, can stay in zero_elimination.h & zero_elimination.cc

Alternatives

As discussed in the thread @MarisaKirisame pointed that the Jacobian should not be calculated explicitly, as it will be multiplied by adjoint anyhow. There exists autodiff packages (e.g., Halide) doing so. I suggest to do it later and first bring this approach in so we can quickly have something workable.

@sgrechanik-h @MarisaKirisame @tqchen @junrushao @haichen @hzfan

MarisaKirisame · March 16, 2020, 10:21pm

I’m planning to get halide style autodiff in in the summer.

grwlf · March 24, 2020, 10:48pm

Glad to see this work continued!

Attaching link to a comment where we report on the original Autodiff VS Relay compatibility. https://github.com/apache/incubator-tvm/issues/2562#issuecomment-461814676