Design problems for Relay to support NLP models

haichen · February 8, 2019, 9:58pm

On my way to support NLP models in TVM and Relay, I encountered some problems that probably requires to some fundamental change or redesign in Relay. So I want to discuss them on the forum.

The problem is how to represent the following examples in Relay.

# Suppose data is a Tensor of shape L x N, where L is sequence length, and N is hidden size
length = data.shape_array()[0]
x = arange(length)

Another example which I found online

inputs_ = tf.placeholder(tf.float32, shape=(None, None, None, None))
depth = tf.shape(inputs_)[-1]
with tf.control_dependencies([
        tf.Assert(
            tf.logical_or(tf.equal(depth, 3), tf.equal(depth, 1)), [depth])
]):
    inputs = tf.cond(
        tf.equal(tf.shape(inputs_)[-1], 3), lambda: inputs_,
        lambda: tf.image.grayscale_to_rgb(inputs_))

We can find that both examples need to extract the shape value from a tensor and use it in further computation. It might be trivial when input type is constant, since we can use type inference and constant folding to solve this. But a more interesting and common case is when the input shape is unknown during the compilation time. In order to represent these examples, certain things are missing in Relay:

Convert relay type node to value node, and potentially from value node to type node again
Be able to use relay expr in the attribute (for the first example)
(minor) Extract one element from a tensor into a scalar

I think these changes are necessary as we want to support more general RNN models and dynamic shapes in TVM and Relay. I’d like to hear what community thinks about this.

cc @tqchen @jroesch @junrushao @zhiics @wweic

junrushao · February 8, 2019, 10:22pm

All of the 3 are very meaningful issues, which may happen very often in implementation of future deep learning models. We should take this issue serious.

tqchen · February 9, 2019, 6:23pm

I would like to comment by point:

Extract one element from tensor into scalar should be already supported, as we can add getitem operator that gives you back a 0-rank tensor.
The relay type into value expression need some runtime support and should not be too hard.
Use of expression in the attribute is something that can cause quite a lot of dynamism. While relay already supports tvm::Expr in the attributes, there needs to be additional runtime support. However, we might be able to get around it in many cases by rewriting, since the dimension is usually known.

The current design trade-off we make is to best support computation where the number of dimensions of the tensor are known, but we do not necessarily known the specific shape value, which should be the common case.

Shape Function Generation

In particular, we would need to add the feature of type function generation to the relay. Which generates a PackedFunc that carries out the shape inference when values are not known. This can be done by passing generic shape var [x, y, z] and get the output shape and dtype relation. Note that not all the TypeRelation are written to support this way(because some rely on constant), and need to be modified to work for the general cases. Similarly, when some of the operators allow attributes to contain tvm::Expr, the TypeRelation provided so far may not handle them, and need to be upgraded (or create a new Op attribute to do so).

junrushao · February 9, 2019, 6:59pm

Understood that type-to-value is somehow easy in relay, which may require some runtime support.

Can you clarify the section “shape function generation” corresponds to value-to-type?

junrushao · February 9, 2019, 7:04pm

Here is an example that Haichen and I found:

np.arange([start, ]stop, [step, ]) vagues the boundary between attr and expr. If it’s inputs are all expr, probably it’s shape couldn’t be determined before hand.

Such cases may include: Boolean mask, unique, etc.

jroesch · February 11, 2019, 1:04am

So I think there are a few issues at play.

For dynamically computing the shape of tensor. We can write a single operator which handles computing shape_of(expr) and returning a vector of size d where d is the number of dimensions. We can simplify away the cases which are known statically using a simplification pass, or modifying the registered tvm::Expr to correctly compute it in such a way it can be scheduled inline. In the unknown cases this will be a dynamic property of the tensor and the generated TVM code should handle computing the shape of the tensor. I advocate for an approach that generates a tvm::Expr as this enables further optimization of fused kernels.

On the subject of value to type the only transformation that makes sense is something likeTypeOf(expr) which produces a Type.

On the topic of passing Relay values to operators the problem is that we have not clearly articulated the difference between attributes and arguments.

Relay’s attributes are designed to be information known at compile time and can be used to type check and optimize operators. This is in contrast to operator’s arguments which can be arbitrary Relay expressions and do not parametrize code generation or type checking (we could change try to change this).

My feeling is we should not pass relay.Expr as attributes (it doesn’t really make sense) and instead move towards writing more dynamic operators which take arguments at runtime instead of compile time.

For example current topi::strided_slice is written in such a way it can never be called dynamically regardless of how we pass the attributes. We need to take arguments other than the data dynamically in order to support programs perform actions like calling strided_slice in a loop body.

First as @tqchen said we can easily get a scalar out of a Tensor today, the problem is if that scalar is produced dynamically then it can’t be passed to an arbitrary anyways because it won’t be known until runtime.

Finally to respond to @junrushao we need to support fully dynamic tensors which is orthogonal to these other questions.

My current feeling is we should extend the type system with a notion of Any shape. This is different from a variable shape as its completely dependent on execution and its relationship can not be statically related to any other dimension.

For example in this world we could support:

Broadcast((10, 1), (10, 10), (10, 10))

Broadcast((10, Any), (10, 10), (10, Any))

I’m chatting with the other Relay type system designers (@MarisaKirisame and Steven) about this and we will post some more detailed proposals soon.

junrushao · February 11, 2019, 8:39pm

I feel that using relay.Expr is a superset of using compile time attributes. So why not replace all attrs with relay.Expr?

jroesch · February 13, 2019, 3:55am

I don’t think there is a reason to have attributes if they do not need to be static.

The only reason we introduced them is to separate the arguments list into static, and dynamic arguments.

We can still support keyword arguments, but this distinction is important as the attributes are not available at runtime, to me it complicates things to take a fragment of Relay program as an argument to statically code generate the operator.

haichen · February 13, 2019, 7:56am

I agree to @jroesch that we probably don’t want to support fully dynamic attributes since it complicates many things.

But one question I have is whether we should use tvm.Expr or relay.Expr for scalar expression. The current design prohibits passing tvm.Expr to relay.Expr and vice versa.

Take the arange(x.shape[0]) example again. Suppose the value of x.shape[0] is a generic tvm shape var. Should we use relay scalar type as a wrapper for it?

haichen · February 13, 2019, 7:57pm

After a second thought, I realize that for certain op like arange requires dynamic attributes. During the shape inference, it only has access to attributes. Therefore, when shape depends on dynamic value instead of input shapes, such as start/stop/step in arange, we have to put these dynamic values in the attributes.

jonso · May 16, 2019, 11:36pm

This is an interesting discussion - was there ever a resolution to this? I am looking at operators like boolean_mask. In TensorFlow, this contains a “where” operator, which returns the coordinates at which “condition” evaluates to true. The shape cannot be known at compile time.