[Relay] Shape inference with partial input shape information


#1

I recently started to learn Relay by comparing it with NNVM. When Relay does shape inference, it requires complete input shape information from all the input tensors to succeed. For example, the following three cases would fail, while in NNVM they can run through. Is this because the add operator is actually a broadcast_add so that the incomplete shape info cannot be inferred, or the design of Relay has deliberately forbidden shape inference with partial input shape information? If it’s the latter one, what is the consideration behind this design?

# NNVM: no shape information provided in x1, treated as unknown shape and would be inferred from x2
# Relay: x1 is treated as a scalar
x1 = relay.var('x1')
x2 = relay.var('x2', shape=(10, 20))
y = relay.add(x1, x2)
# NNVM: zero ndim means unknown shape and will be inferred from x2
# Relay: x1 is treated as a scalar
x1 = relay.var('x1', shape=())
x2 = relay.var('x2', shape=(10, 20))
y = relay.add(x1, x2)
# NNVM: 0 dim size means unknown
# Relay: there is no such indicator
x1 = relay.var('x1', shape=(10, 0))
x2 = relay.var('x2', shape=(10, 20))
y = relay.add(x1, x2)

#2

In Relay, you have to explicitly state that the dimension is variably shaped by passing in tvm.var. You can check out the tests for examples.

Note that the “allocate memory” pass will fail if one of the the output storages depends on the variable-sized dimension. This kind of defeats the purpose, but is otherwise inevitable. It seems like we want to put an upper bound on the size but, in that case, padding likely works just as well .


#3

Partial shape inference still works when necessary, the only reason here is because we adopt numpy semantics for add, and there is no such constraint.

See for example, https://github.com/dmlc/tvm/blob/master/tests/python/relay/test_op_level2.py#L13 shape of w is unknown and get inferred in conv2d.


#4

Thank you, @tqchen and @nhynes . I can understand the example inferring the unknown shape within the same operator given input data. Since all Relay element-wise ops support the broadcast semantics, I couldn’t think of an example of inferring unknown shapes across different operators in a network.

Another question: how is a partially known shape, such as (10, 3, 0, 0) in NNVM where 0 indicates unknown dim size, represented in Relay?