TVM TF Converter Bug with Inception_v4 network

I was trying to compile the pre-trained inception_v4 model with tvm, but I encountered this bug in the middle of compilation:

Traceback (most recent call last):
  File "benchmark.py", line 72, in <module>
    main()
  File "benchmark.py", line 43, in main
    model_dirname, s3_dirname)
  File "/home/ubuntu/benchmark/model.py", line 610, in construct_model
    return zoos[zoo](model, framework, backend, dirname, s3_dirname)
  File "/home/ubuntu/benchmark/model.py", line 428, in __init__
    sym, params = nnvm.frontend.from_tensorflow(graph_def)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/nnvm-0.8.0-py3.6.egg/nnvm/frontend/tensorflow.py", line 1372, in from_tensorflow
    sym, params = g.from_tensorflow(graph, layout)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/nnvm-0.8.0-py3.6.egg/nnvm/frontend/tensorflow.py", line 1146, in from_tensorflow
    op = self._convert_operator(node.op, inputs, attr, graph)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/nnvm-0.8.0-py3.6.egg/nnvm/frontend/tensorflow.py", line 1333, in _convert_operator
    sym = convert_map[op_name](inputs, attrs, self._params)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/nnvm-0.8.0-py3.6.egg/nnvm/frontend/tensorflow.py", line 615, in _impl
    out_shape = _infer_out_shapes(out, params)[0]
  File "/home/ubuntu/.local/lib/python3.6/site-packages/nnvm-0.8.0-py3.6.egg/nnvm/frontend/tensorflow.py", line 536, in _infer_out_shapes
    _, out_shapes = graph_util.infer_shape(g, **shape_dict)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/nnvm-0.8.0-py3.6.egg/nnvm/compiler/graph_util.py", line 31, in infer_shape
    graph = graph.apply("InferShape")
  File "/home/ubuntu/.local/lib/python3.6/site-packages/nnvm-0.8.0-py3.6.egg/nnvm/graph.py", line 234, in apply
    check_call(_LIB.NNGraphApplyPasses(self.handle, npass, cpass, ctypes.byref(ghandle)))
  File "/home/ubuntu/.local/lib/python3.6/site-packages/nnvm-0.8.0-py3.6.egg/nnvm/_base.py", line 75, in check_call
    raise NNVMError(py_str(_LIB.NNGetLastError()))
nnvm._base.NNVMError: Error in operator import/InceptionV4/InceptionV4/Conv2d_1a_3x3/Conv2D: [21:48:55] /home/ubuntu/tvm/nnvm/src/top/nn/convolution.cc:120: Operator conv2d(layout=NHWC, use_bias=False, strides=(2, 2), padding=[0, 0], kernel_layout=HWIO, kernel_size=(3, 3), channels=32, dilation=(1, 1), name=import/InceptionV4/InceptionV4/Conv2d_1a_3x3/Conv2D) expects data's shape to be [4294967295,299,299,3], but got [-1,299,299,3]

One thing to notice here is that 4294967295 = 2^32 - 1, is the dimension somehow stored as unsigned numbers and not able to handle negative dimensions? Thanks!

I know the probable cause for this effect.
I will share a PR soon.

1 Like

Thanks a lot for your help!
I also discovered a bug with MobileNet V2:
Traceback (most recent call last):
File “sanity.py”, line 105, in
main()
File “sanity.py”, line 78, in main
sym, params = nnvm.frontend.from_tensorflow(tf_graph.as_graph_def(add_shapes=True))
File “/home/ubuntu/.local/lib/python3.6/site-packages/nnvm-0.8.0-py3.6.egg/nnvm/frontend/tensorflow.py”, line 1371, in from_tensorflow
sym, params = g.from_tensorflow(graph, layout)
File “/home/ubuntu/.local/lib/python3.6/site-packages/nnvm-0.8.0-py3.6.egg/nnvm/frontend/tensorflow.py”, line 1145, in from_tensorflow
op = self._convert_operator(node.op, inputs, attr, graph)
File “/home/ubuntu/.local/lib/python3.6/site-packages/nnvm-0.8.0-py3.6.egg/nnvm/frontend/tensorflow.py”, line 1332, in _convert_operator
sym = convert_map[op_name](inputs, attrs, self._params)
File “/home/ubuntu/.local/lib/python3.6/site-packages/nnvm-0.8.0-py3.6.egg/nnvm/frontend/tensorflow.py”, line 249, in _impl
in_h = input_shape[1]
IndexError: list index out of range
Could you also take a look at that problem too? Thanks!

should resolve all shape related issues described here.

Thanks a lot for your fix, looking forward to it!

Ref. Teach NNVM recognize tensorflow model

Inception_V4 works fine now.

I downloaded the inception v4 ckpt model from tensorflow slim and freeze the graph as usual following the scripts from tensorflow doc, however, when it comes to the tune_relay_cuda.py from tutorial, error occurs:

Traceback (most recent call last):
File “tune_relay_cuda_zk.py”, line 215, in
net, params = relay.frontend.from_tensorflow(graph_def, layout=layout, shape=shape_dict)
File “/usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/relay/frontend/tensorflow.py”, line 1537, in from_tensorflow
sym, params = g.from_tensorflow(graph, layout, shape, outputs)
File “/usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/relay/frontend/tensorflow.py”, line 1314, in from_tensorflow
out_type = ir_pass.infer_type(node_output[0])
File “/usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/relay/ir_pass.py”, line 45, in infer_type
return _ir_pass.infer_type(expr, mod)
File “tvm/_ffi/_cython/./function.pxi”, line 286, in tvm._ffi._cy3.core.FunctionBase.call
File “tvm/_ffi/_cython/./function.pxi”, line 221, in tvm._ffi._cy3.core.FuncCall
File “tvm/_ffi/_cython/./function.pxi”, line 210, in tvm._ffi._cy3.core.FuncCall3
File “tvm/_ffi/_cython/./base.pxi”, line 151, in tvm._ffi._cy3.core.CALL
tvm._ffi.base.TVMError: [02:25:18] /usr/tvm/src/relay/ir/error.cc:112:
Error(s) have occurred. We have annotated the program with them:
In main:
fn () {
free_var %InceptionV4/Logits/PreLogitsFlatten/flatten/Shape: Tensor[(4,), int32]
%0 = strided_slice(%InceptionV4/Logits/PreLogitsFlatten/flatten/Shape, begin=[0], end=[1], strides=[1]) #
%1 = reshape(%0, newshape=[]) #
%2 = expand_dims(%1, axis=0) #
free_var %InceptionV4/Logits/PreLogitsFlatten/flatten/Reshape/shape/1: Tensor[(1,), int32]
%3 = expand_dims(%InceptionV4/Logits/PreLogitsFlatten/flatten/Reshape/shape/1, axis=0) #
%4 = (%2, %3)
%5 = concatenate(%4) # an internal invariant was violdated whiletypechecking your program[02:25:18] /usr/tvm/src/relay/op/tensor/transform.cc:188: Check failed: e_ndim == ndim (2 vs. 1) relay.concatenate requires all tensors have the same ndim
Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0x73261d) [0x7f35f43df61d]
[bt] (1) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0x73326d) [0x7f35f43e026d]
[bt] (2) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0xb732ca) [0x7f35f48202ca]
[bt] (3) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0xad9e34) [0x7f35f4786e34]
[bt] (4) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0xc4a197) [0x7f35f48f7197]
[bt] (5) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0xc31b03) [0x7f35f48deb03]
[bt] (6) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(tvm::relay::InferType(tvm::relay::Function const&, tvm::relay::Module const&, tvm::relay::GlobalVar const&)+0x32b) [0x7f35f48df86b]
[bt] (7) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0xa98680) [0x7f35f4745680]
[bt] (8) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0xa99266) [0x7f35f4746266]
[bt] (9) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(tvm::relay::InferType(tvm::relay::Expr const&, tvm::relay::Module const&)+0x41d) [0x7f35f48df1ad]
;
%5
}
Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0x73261d) [0x7f35f43df61d]
[bt] (1) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0x73326d) [0x7f35f43e026d]
[bt] (2) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0xa6c1e5) [0x7f35f47191e5]
[bt] (3) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0xc31d3a) [0x7f35f48ded3a]
[bt] (4) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(tvm::relay::InferType(tvm::relay::Function const&, tvm::relay::Module const&, tvm::relay::GlobalVar const&)+0x32b) [0x7f35f48df86b]
[bt] (5) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0xa98680) [0x7f35f4745680]
[bt] (6) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0xa99266) [0x7f35f4746266]
[bt] (7) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(tvm::relay::InferType(tvm::relay::Expr const&, tvm::relay::Module const&)+0x41d) [0x7f35f48df1ad]
[bt] (8) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0xc323f7) [0x7f35f48df3f7]
[bt] (9) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(TVMFuncCall+0x5e) [0x7f35f4aefa8e]

@lanyastar can you try with below patch on latest code.

You may refer below script to check most of the tensorflow models.

Thanks for sharing the codes, the most models works okay through the latest code of tvm 0.6, but the inception_v2_resnet and inception_v4 are encountered with following errors:

2019-03-21 07:16:24.257240: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484]     Adding visible gpu devices: 0, 1, 2, 3
2019-03-21 07:16:25.696906: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-21 07:16:25.696985: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971]      0 1 2 3 
2019-03-21 07:16:25.697001: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0:   N Y N N 
2019-03-21 07:16:25.697039: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 1:   Y N N N 
2019-03-21 07:16:25.697058: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 2:   N N N Y 
2019-03-21 07:16:25.697081: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 3:   N N Y N 
2019-03-21 07:16:25.698738: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 21544 MB memory) -> physical GPU (device: 0, name: Tesla P40, pci bus id: 0000:02:00.0, compute capability: 6.1)
2019-03-21 07:16:26.085429: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 529 MB memory) -> physical GPU (device: 1, name: Tesla P40, pci bus id: 0000:04:00.0, compute capability: 6.1)
2019-03-21 07:16:26.096568: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 21544 MB memory) -> physical GPU (device: 2, name: Tesla P40, pci bus id: 0000:83:00.0, compute capability: 6.1)
2019-03-21 07:16:26.475924: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 21544 MB memory) -> physical GPU (device: 3, name: Tesla P40, pci bus id: 0000:84:00.0, compute capability: 6.1)
Traceback (most recent call last):
  File "test_forward_tf_vision_slim.py", line 628, in <module>
    compile_test_tvm_tf()
  File "test_forward_tf_vision_slim.py", line 614, in compile_test_tvm_tf
    tvm_output = run_tvm_graph(graph_def, data, in_node_name, target='llvm')
  File "test_forward_tf_vision_slim.py", line 551, in run_tvm_graph
    outputs=out_names)
  File "/usr/local/lib/python3.5/dist-packages/tvm-0.6.dev0-py3.5-linux-x86_64.egg/tvm/relay/frontend/tensorflow.py", line 1719, in from_tensorflow
    sym, params = g.from_tensorflow(graph, layout, shape, outputs)
  File "/usr/local/lib/python3.5/dist-packages/tvm-0.6.dev0-py3.5-linux-x86_64.egg/tvm/relay/frontend/tensorflow.py", line 1455, in from_tensorflow
    op = self._convert_operator(node.op, inputs, attr, graph)
  File "/usr/local/lib/python3.5/dist-packages/tvm-0.6.dev0-py3.5-linux-x86_64.egg/tvm/relay/frontend/tensorflow.py", line 1691, in _convert_operator
    sym = convert_map[op_name](inputs, attrs, self._params)
  File "/usr/local/lib/python3.5/dist-packages/tvm-0.6.dev0-py3.5-linux-x86_64.egg/tvm/relay/frontend/tensorflow.py", line 526, in _impl
    shape_arg = params.pop(pop_node.name_hint)
  File "tvm/_ffi/_cython/./node.pxi", line 64, in tvm._ffi._cy3.core.NodeBase.__getattr__
AttributeError: '<class 'tvm.relay.expr.Call'>' object has no attribute 'name_hint'

@lanyastar

Did you check with above patch ?

I tried the patch above and all the models listed in the py script worked well now. Thanks for sharing.

1 Like