From_onnx conversion issue with MatMul+Add


#1

I am trying to import an onnx graph into tvm and I’m running into an issue with hitting a dimension matching assert:

onnx_model = onnx.load('xor.onnx')
xor_sym, xor_params = nnvm.frontend.from_onnx(onnx_model)

investigate xor_params:

import pprint
pprint.pprint(xor_params, indent=3)

OUT:

{  'B': <tvm.NDArray shape=(2,), cpu(0)>
array([0., 0.], dtype=float32),
   'B1': <tvm.NDArray shape=(2,), cpu(0)>
array([0., 0.], dtype=float32),
   'W': <tvm.NDArray shape=(2, 2), cpu(0)>
array([[-0.5194443 ,  0.92808163],
       [ 0.68033445, -0.04199219]], dtype=float32),
   'W1': <tvm.NDArray shape=(2, 2), cpu(0)>
array([[-0.9073467 ,  0.44402432],
       [ 0.77458537,  0.17634082]], dtype=float32)}

I then try to compile the graph:

import nnvm.compiler
target = 'llvm'

shape_dict = {'dense_1_input_0': (1,2)}

with nnvm.compiler.build_config(opt_level=0):
    graph, lib, params = nnvm.compiler.build(xor_sym, target, shape_dict, params=xor_params)

OUT:

...
NNVMError: Error in operator elemwise_add0: [09:55:05] /home/agsim/repos/tvm/nnvm/src/top/nn/../elemwise_op_common.h:38: Check failed: assign(&dattr, (*vec)[i]) Incompatible attr in node elemwise_add0 at 1-th input: expected [1,2], got [2]
...

I think the cause is because in this network when we call nnvm.frontend.from_onnx(onnx_model) we get the following in the symbolic representation of the graph:

...
Op:matmul, Name=matmul0
Inputs:
	arg[0]=dense_1_input_0(0) version=0
	arg[1]=W(0) version=0
Variable:B
--------------------
Op:elemwise_add, Name=elemwise_add0
Inputs:
	arg[0]=matmul0(0)
	arg[1]=B(0) version=0
--------------------
...

compared to the super resolution which has conv followed by add…

...
--------------------
Op:conv2d, Name=conv2d0
Inputs:
	arg[0]=1(0) version=0
	arg[1]=2(0) version=0
Attrs:
	channels=64
	dilation=(1, 1)
	groups=1
	kernel_size=(5, 5)
	padding=(2, 2)
	strides=(1, 1)
	use_bias=False
Variable:3
--------------------
Op:expand_dims, Name=expand_dims0
Inputs:
	arg[0]=3(0) version=0
Attrs:
	axis=1
	num_newaxis=2
--------------------
Op:broadcast_add, Name=broadcast_add0
Inputs:
	arg[0]=conv2d0(0)
	arg[1]=expand_dims0(0)
--------------------
...

Is there any reason Conv -> Add in onnx is converted to BROADCAST_ADD(CONV_OUT, EXPAND_DIMS(BIAS)) and MatMul -> Add is converted to ELEMENTWISE_ADD(MATMUL_OUT, BIAS)?

Seems like i could fix this by hacking the xor_params Bias tensors to be shape 1,2 but that’s basically doing the expand_dims call myself.

Note: the xor.onnx was exported from pytorch and the Add does not have any attributes for the whereas super_resolution.onnx's Add has broadcast=1, axis=1. Was there something special done when exporting the onnx graph for super_resolution to generate these attributes?


#2

Changes seem to be caused by ONNX PR # https://github.com/onnx/onnx/pull/907/. Explicit broadcast flags and axis were removed from all ops and instead numpy style implicit broadcasting is used.


#3

I have a commit that fixes this problem by converting all the elementwise ops in from_onnx to be the broadcast versions. The Elemwise class was removed and Add/Sub/Mul/Div were converted to renamers to the broadcast_* variant). I would be happy to submit this as a PR, but I’m not sure how to deal with backwards compatibility with older versions of onnx where broadcast and axis attributes were still used. If someone could advise how this could be handled, I’m all ears!