TVM TF Converter Bug with Inception_v4 network

haojin2 · October 2, 2018, 10:20pm

I was trying to compile the pre-trained inception_v4 model with tvm, but I encountered this bug in the middle of compilation:

Traceback (most recent call last):
  File "benchmark.py", line 72, in <module>
    main()
  File "benchmark.py", line 43, in main
    model_dirname, s3_dirname)
  File "/home/ubuntu/benchmark/model.py", line 610, in construct_model
    return zoos[zoo](model, framework, backend, dirname, s3_dirname)
  File "/home/ubuntu/benchmark/model.py", line 428, in __init__
    sym, params = nnvm.frontend.from_tensorflow(graph_def)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/nnvm-0.8.0-py3.6.egg/nnvm/frontend/tensorflow.py", line 1372, in from_tensorflow
    sym, params = g.from_tensorflow(graph, layout)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/nnvm-0.8.0-py3.6.egg/nnvm/frontend/tensorflow.py", line 1146, in from_tensorflow
    op = self._convert_operator(node.op, inputs, attr, graph)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/nnvm-0.8.0-py3.6.egg/nnvm/frontend/tensorflow.py", line 1333, in _convert_operator
    sym = convert_map[op_name](inputs, attrs, self._params)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/nnvm-0.8.0-py3.6.egg/nnvm/frontend/tensorflow.py", line 615, in _impl
    out_shape = _infer_out_shapes(out, params)[0]
  File "/home/ubuntu/.local/lib/python3.6/site-packages/nnvm-0.8.0-py3.6.egg/nnvm/frontend/tensorflow.py", line 536, in _infer_out_shapes
    _, out_shapes = graph_util.infer_shape(g, **shape_dict)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/nnvm-0.8.0-py3.6.egg/nnvm/compiler/graph_util.py", line 31, in infer_shape
    graph = graph.apply("InferShape")
  File "/home/ubuntu/.local/lib/python3.6/site-packages/nnvm-0.8.0-py3.6.egg/nnvm/graph.py", line 234, in apply
    check_call(_LIB.NNGraphApplyPasses(self.handle, npass, cpass, ctypes.byref(ghandle)))
  File "/home/ubuntu/.local/lib/python3.6/site-packages/nnvm-0.8.0-py3.6.egg/nnvm/_base.py", line 75, in check_call
    raise NNVMError(py_str(_LIB.NNGetLastError()))
nnvm._base.NNVMError: Error in operator import/InceptionV4/InceptionV4/Conv2d_1a_3x3/Conv2D: [21:48:55] /home/ubuntu/tvm/nnvm/src/top/nn/convolution.cc:120: Operator conv2d(layout=NHWC, use_bias=False, strides=(2, 2), padding=[0, 0], kernel_layout=HWIO, kernel_size=(3, 3), channels=32, dilation=(1, 1), name=import/InceptionV4/InceptionV4/Conv2d_1a_3x3/Conv2D) expects data's shape to be [4294967295,299,299,3], but got [-1,299,299,3]

One thing to notice here is that 4294967295 = 2^32 - 1, is the dimension somehow stored as unsigned numbers and not able to handle negative dimensions? Thanks!

srkreddy1238 · October 3, 2018, 3:49am

I know the probable cause for this effect.
I will share a PR soon.

haojin2 · October 3, 2018, 6:57pm

Thanks a lot for your help!
I also discovered a bug with MobileNet V2:
Traceback (most recent call last):
File “sanity.py”, line 105, in
main()
File “sanity.py”, line 78, in main
sym, params = nnvm.frontend.from_tensorflow(tf_graph.as_graph_def(add_shapes=True))
File “/home/ubuntu/.local/lib/python3.6/site-packages/nnvm-0.8.0-py3.6.egg/nnvm/frontend/tensorflow.py”, line 1371, in from_tensorflow
sym, params = g.from_tensorflow(graph, layout)
File “/home/ubuntu/.local/lib/python3.6/site-packages/nnvm-0.8.0-py3.6.egg/nnvm/frontend/tensorflow.py”, line 1145, in from_tensorflow
op = self._convert_operator(node.op, inputs, attr, graph)
File “/home/ubuntu/.local/lib/python3.6/site-packages/nnvm-0.8.0-py3.6.egg/nnvm/frontend/tensorflow.py”, line 1332, in _convert_operator
sym = convert_map[op_name](inputs, attrs, self._params)
File “/home/ubuntu/.local/lib/python3.6/site-packages/nnvm-0.8.0-py3.6.egg/nnvm/frontend/tensorflow.py”, line 249, in _impl
in_h = input_shape[1]
IndexError: list index out of range
Could you also take a look at that problem too? Thanks!

srkreddy1238 · October 18, 2018, 9:42am

should resolve all shape related issues described here.

haojin2 · October 19, 2018, 6:08pm

Thanks a lot for your fix, looking forward to it!

srkreddy1238 · October 24, 2018, 11:55am

Ref. Teach NNVM recognize tensorflow model

Inception_V4 works fine now.

lanyastar · March 5, 2019, 4:51am

I downloaded the inception v4 ckpt model from tensorflow slim and freeze the graph as usual following the scripts from tensorflow doc, however, when it comes to the tune_relay_cuda.py from tutorial, error occurs:

Traceback (most recent call last):
File “tune_relay_cuda_zk.py”, line 215, in
net, params = relay.frontend.from_tensorflow(graph_def, layout=layout, shape=shape_dict)
File “/usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/relay/frontend/tensorflow.py”, line 1537, in from_tensorflow
sym, params = g.from_tensorflow(graph, layout, shape, outputs)
File “/usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/relay/frontend/tensorflow.py”, line 1314, in from_tensorflow
out_type = ir_pass.infer_type(node_output[0])
File “/usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/relay/ir_pass.py”, line 45, in infer_type
return _ir_pass.infer_type(expr, mod)
File “tvm/_ffi/_cython/./function.pxi”, line 286, in tvm._ffi._cy3.core.FunctionBase.call
File “tvm/_ffi/_cython/./function.pxi”, line 221, in tvm._ffi._cy3.core.FuncCall
File “tvm/_ffi/_cython/./function.pxi”, line 210, in tvm._ffi._cy3.core.FuncCall3
File “tvm/_ffi/_cython/./base.pxi”, line 151, in tvm._ffi._cy3.core.CALL
tvm._ffi.base.TVMError: [02:25:18] /usr/tvm/src/relay/ir/error.cc:112:
Error(s) have occurred. We have annotated the program with them:
In main:
fn () {
free_var %InceptionV4/Logits/PreLogitsFlatten/flatten/Shape: Tensor[(4,), int32]
%0 = strided_slice(%InceptionV4/Logits/PreLogitsFlatten/flatten/Shape, begin=[0], end=[1], strides=[1]) #
%1 = reshape(%0, newshape=[]) #
%2 = expand_dims(%1, axis=0) #
free_var %InceptionV4/Logits/PreLogitsFlatten/flatten/Reshape/shape/1: Tensor[(1,), int32]
%3 = expand_dims(%InceptionV4/Logits/PreLogitsFlatten/flatten/Reshape/shape/1, axis=0) #
%4 = (%2, %3)
%5 = concatenate(%4) # an internal invariant was violdated whiletypechecking your program[02:25:18] /usr/tvm/src/relay/op/tensor/transform.cc:188: Check failed: e_ndim == ndim (2 vs. 1) relay.concatenate requires all tensors have the same ndim
Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0x73261d) [0x7f35f43df61d]
[bt] (1) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0x73326d) [0x7f35f43e026d]
[bt] (2) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0xb732ca) [0x7f35f48202ca]
[bt] (3) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0xad9e34) [0x7f35f4786e34]
[bt] (4) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0xc4a197) [0x7f35f48f7197]
[bt] (5) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0xc31b03) [0x7f35f48deb03]
[bt] (6) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(tvm::relay::InferType(tvm::relay::Function const&, tvm::relay::Module const&, tvm::relay::GlobalVar const&)+0x32b) [0x7f35f48df86b]
[bt] (7) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0xa98680) [0x7f35f4745680]
[bt] (8) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0xa99266) [0x7f35f4746266]
[bt] (9) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(tvm::relay::InferType(tvm::relay::Expr const&, tvm::relay::Module const&)+0x41d) [0x7f35f48df1ad]
;
%5
}
Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0x73261d) [0x7f35f43df61d]
[bt] (1) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0x73326d) [0x7f35f43e026d]
[bt] (2) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0xa6c1e5) [0x7f35f47191e5]
[bt] (3) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0xc31d3a) [0x7f35f48ded3a]
[bt] (4) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(tvm::relay::InferType(tvm::relay::Function const&, tvm::relay::Module const&, tvm::relay::GlobalVar const&)+0x32b) [0x7f35f48df86b]
[bt] (5) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0xa98680) [0x7f35f4745680]
[bt] (6) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0xa99266) [0x7f35f4746266]
[bt] (7) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(tvm::relay::InferType(tvm::relay::Expr const&, tvm::relay::Module const&)+0x41d) [0x7f35f48df1ad]
[bt] (8) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0xc323f7) [0x7f35f48df3f7]
[bt] (9) /usr/local/lib/python3.5/dist-packages/tvm-0.5.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(TVMFuncCall+0x5e) [0x7f35f4aefa8e]

srkreddy1238 · March 20, 2019, 1:10pm

@lanyastar can you try with below patch on latest code.

You may refer below script to check most of the tensorflow models.

github.com

srkreddy1238/dmlc_data/blob/master/work/tf/relay/test_forward_tf_vision_slim.py

#!/usr/bin/python3

import os
import sys
import tarfile,sys

# tvm
import tvm
import tvm.relay as relay

# os and numpy
import numpy as np
import os.path

# Tensorflow imports
import tensorflow as tf
from tensorflow.core.framework import graph_pb2
from tensorflow.python.framework import dtypes
from tensorflow.python.framework import tensor_util

This file has been truncated. show original

lanyastar · March 21, 2019, 7:21am

Thanks for sharing the codes, the most models works okay through the latest code of tvm 0.6, but the inception_v2_resnet and inception_v4 are encountered with following errors:

2019-03-21 07:16:24.257240: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484]     Adding visible gpu devices: 0, 1, 2, 3
2019-03-21 07:16:25.696906: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-21 07:16:25.696985: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971]      0 1 2 3 
2019-03-21 07:16:25.697001: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0:   N Y N N 
2019-03-21 07:16:25.697039: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 1:   Y N N N 
2019-03-21 07:16:25.697058: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 2:   N N N Y 
2019-03-21 07:16:25.697081: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 3:   N N Y N 
2019-03-21 07:16:25.698738: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 21544 MB memory) -> physical GPU (device: 0, name: Tesla P40, pci bus id: 0000:02:00.0, compute capability: 6.1)
2019-03-21 07:16:26.085429: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 529 MB memory) -> physical GPU (device: 1, name: Tesla P40, pci bus id: 0000:04:00.0, compute capability: 6.1)
2019-03-21 07:16:26.096568: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 21544 MB memory) -> physical GPU (device: 2, name: Tesla P40, pci bus id: 0000:83:00.0, compute capability: 6.1)
2019-03-21 07:16:26.475924: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 21544 MB memory) -> physical GPU (device: 3, name: Tesla P40, pci bus id: 0000:84:00.0, compute capability: 6.1)
Traceback (most recent call last):
  File "test_forward_tf_vision_slim.py", line 628, in <module>
    compile_test_tvm_tf()
  File "test_forward_tf_vision_slim.py", line 614, in compile_test_tvm_tf
    tvm_output = run_tvm_graph(graph_def, data, in_node_name, target='llvm')
  File "test_forward_tf_vision_slim.py", line 551, in run_tvm_graph
    outputs=out_names)
  File "/usr/local/lib/python3.5/dist-packages/tvm-0.6.dev0-py3.5-linux-x86_64.egg/tvm/relay/frontend/tensorflow.py", line 1719, in from_tensorflow
    sym, params = g.from_tensorflow(graph, layout, shape, outputs)
  File "/usr/local/lib/python3.5/dist-packages/tvm-0.6.dev0-py3.5-linux-x86_64.egg/tvm/relay/frontend/tensorflow.py", line 1455, in from_tensorflow
    op = self._convert_operator(node.op, inputs, attr, graph)
  File "/usr/local/lib/python3.5/dist-packages/tvm-0.6.dev0-py3.5-linux-x86_64.egg/tvm/relay/frontend/tensorflow.py", line 1691, in _convert_operator
    sym = convert_map[op_name](inputs, attrs, self._params)
  File "/usr/local/lib/python3.5/dist-packages/tvm-0.6.dev0-py3.5-linux-x86_64.egg/tvm/relay/frontend/tensorflow.py", line 526, in _impl
    shape_arg = params.pop(pop_node.name_hint)
  File "tvm/_ffi/_cython/./node.pxi", line 64, in tvm._ffi._cy3.core.NodeBase.__getattr__
AttributeError: '<class 'tvm.relay.expr.Call'>' object has no attribute 'name_hint'

srkreddy1238 · March 21, 2019, 9:38am

@lanyastar

Did you check with above patch ?

lanyastar · March 21, 2019, 12:50pm

I tried the patch above and all the models listed in the py script worked well now. Thanks for sharing.