If I understand correctly,
add_to(a, b) increment a, with b. During this process, the value of a will be changed.
The second approach is a, imho, RED FLAG idea, that I dont think we should do.
If add_to is implemented as above, it will greatly complicate Operator Fusion, Gradient, Partial Evaluation, FoldScaleAxis… etc. Basically all existing pass will be in great danger.
The problem is mutation introduce two problem: in optimization, you dont know if the tensor had been changed somewhere else, and you dont know if changing the tensor will change the code in other places (because of alias).
In particular, see the section
Automatic differentiation in PyTorch - they implement delicate datastructure (which is hard to do as source code transformation in relay, which does not pollute the runtime), and reject some case. Right now, we have no such problem because we dont have mutation for tensor.
As another example, the performance of ConstantFolding will tank, and basically be useless, because it is hard to know whether some constant is really constant - or they got changed. Thus it will just do nothing or risk being wrong. Same story go for Fusion.
Tensor should continue to has no mutation, and Approach 1 look way better.
If approach 1 cannot describe all cases, we can still wrap every tensor as a reference of tensor, and in every operator unwrap/wrap accordingly. This will tank the performance of most pass, and some will probably downright fail, but this is what will happend to every single program if we allow mutation in tensor.