[DISCUSS] Use Object Protocol for Module/NDArray

The unified object protocol is now used by almost all of our runtime objects. However, there are still two exceptions: Module and NDArray. This RFC tries to discuss the possibility to unify them to the object protocol.

The advantage for unification is clear. After the unification, we can benefit from additional features in the object system. e.g. use ADT to directly store NDArrays. We can remove vm::Tensor which is created to overcome this limitation.

However, the change can also bring possible problems depends on how do we do it.
We discuss the design choices in this RFC.

Considerations

The current PackedFunc C API treats NDArray and Module differently from Objects.
We call NDArrayAlloc to allocate an Array and call NDArrayFree to release it.
When we pass an NDArray to a PackedFunc, we set the type code to be kArrayHandle.

This explicit runtime API is simple to implement:
we do not have to follow the explicit object protocol when implementing a minimum runtime.
We can implement the NDArray in a way that does have to deal with ref counting deleter setup.
Moreover, the explicit type_code makes RPC’s implementation easy.
As we do not have to deal with the second-level dispatching that reads the type index.

Another key property of the current NDArray::Container the compatibility of its pointer with DLTensor*.
We can directly take an NDArrayHandle allocated from C API and treat it as DLTensor*.
This property may no longer hold if we make NDArray::Container to be a sub-class of Object.

Finally, we need to think about how can we handle sub-classes of NDArray. Right now we
introduce a second tag field in the NDArray::Container, we need to think about whether
or not do we want to reuse Object’s type hierarchy, or keep the old approach.

We discuss two options below in terms of calling convention.

Option1: Move to Object Calling Convention

The first option is to simply force NDArray and Module to use the new Object calling convention.
That means that we no longer pass kNDArrayHandle in the type code and instead will pass kObjectHandle.
We can also directly use ObjectFree to de-allocate the NDArray.

This option will force us to change the ABI of PackedFunc calling convention
and needs major updates from all frontend runtimes.
It will also complicate the implementation of RPC a bit, as we cannot exchange
everything through RPC and need to specially handle Module and NDArray here.
Finally, we will also break the compatibility of the NDArrayHandle with DLTensor*.

Option2: Use the New Object Protocol but Keep the Original Calling Convention

In this case, when we assign an ObjectRef to TVMArgs, we specially check if
the reference points to NDArray or Module and set the type_code correctly to be
kArrayHandle and kModuleHandle.

We further use a special argument passing convention for NDArray to
pass the address of the DLTensor and recover the address(by arithmetics) when converting back to an NDArray. This allows us to the backward compatibility NDArrayHandle and DLTensor*.
Although that does mean we can no longer use the ObjectFree to de-allocate NDArrays.

One potential drawback of this approach is the additional checks in the PackedFunc call when an ObjectRef is involved. Although they are very cheap, and we can reduce it further by
avoid such checks when we know more static type information. This approach enjoys the backward compatibility of the ABI and no changes are needed in the frontends.

Discussion

It would be great if we can discuss with option we would like to take.
Of course we can also do nop and continue to support NDArray and Module differently.

1 Like

cc @ajtulloch @yzhliu @liangfu @nhynes would be great to get your perspectives as you implemented the runtime variants

I mainly have concern about DLTensor incompatibility in Opt1, how large is the overhead to convert between NDArrayHandle <-> DLTensor* ? if it is small enough I prefer Opt1 as it is clean and easy to understand.

Thanks for bringing this up!

Might be off the topic, but I do think it important that our calling convention at least accepting DLTensors, as required by DGL (bridging PyTorch and MXNet) and zero copy from numpy (I assembled a DLTensor in the frontend).

The conversion overhead is low(just pointer add an offset).

The only concern for enforcing option1 is that it will adds a bit complexity of minimum version of runtimes. e.g. We can no longer implement NDArray by just create a DLTensor without ref-counting(which is possible in the current case) .

DLTensor* will also be supported due to the points mentioned by @junrushao1994

Note that we can always starts with option2 then move to option1.

Thanks everyone for helpful discussions. I have moved it to a formal RFC https://github.com/apache/incubator-tvm/issues/4286

1 Like