Teach NNVM recognize tensorflow model

weibo · June 8, 2018, 9:52am

Is it possible that import tf model to mxnet, then convert to NNVM? Is there any problem for this kind of conversion? Thanks.

srkreddy1238 · June 8, 2018, 11:05am

https://github.com/dmlc/tvm/pull/1188

PR under review for tensorflow frontend.

Works good for InceptionV1 & V3.
Mobilenet also works for me with few changes which is yet to PR.

jjiang2cal · September 6, 2018, 3:02am

Hi,

I found your discussion of performance of resnet50 on tensorflow and tvm on https://github.com/dmlc/nnvm/issues/440.

Could you share how you converted tensorflow resnet50 model to nnvm? I tried to but failed, as Anyone successful converting tensorflow resnet50 model to nnvm?. If you would not mind sharing your advice, I appreciate it very much.

Thanks!

FrozenGene · September 7, 2018, 3:11am

We have one fork of tvm. We have done many things on CoreML frontend. And we convert TF resnet50 to CoreML, then we conver CoreML to NNVM.

jjiang2cal · September 14, 2018, 6:18pm

Did you encounter problems in converting resnet50 tf model to coreml? I had error messages as https://github.com/tf-coreml/tf-coreml/issues/210, but I saw you reported the error message on a different model. Any advice on how to solve the resnet50 conversion issue? Thanks a lot.

FrozenGene · September 17, 2018, 9:06am

I don’t meet this error when to convert resnet50. You could try the way I mentioned in that issue.

jjiang2cal · September 18, 2018, 12:00am

@FrozenGene

I followed your advice converting tf resnet50 to coreml, and had the above problems.

My resnet50 model is the pretrained model ResNet-50 v1 or ResNet-50 v2 from https://github.com/tensorflow/models/tree/master/official/resnet. The model is in saved_model format. I freezed the model by

python freeze_graph.py --input_saved_model_dir=saved_model_dir --output_graph=frozen_model.pb --output_node_names=ArgMax --clear_devices

Freezing was successful. But both models raise
ValueError: Length of the 'dim' parameter must be equal to 4
when converted to coreml.

Shoudl I use a different tf resnet50 model? Could you share which tf resnet50 model you use and how you freeze it, if this is public information?

Thank you very much.

FrozenGene · September 18, 2018, 8:09am

Just official resnet50 model provided by Tensorflow, not special. https://github.com/tensorflow/models/tree/master/research/slim

jjiang2cal · September 18, 2018, 11:41pm

@FrozenGene

Did you use v1 or v2 on that page? How did you freeze it?

I downloaded v1, and used
python freeze_graph.py --input_graph=resnet_v1_50_inf_graph.pb --input_checkpoint=resnet_v1_50.ckpt --input_binary=true --output_graph=frozen_resnet_v1_50_slim.pb --output_node_names=resnet_v1_50/predictions/Reshape_1
to freeze it. But I got an error:

File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1759, in restore
    err, "a mismatch between the current graph and the graph")
tensorflow.python.framework.errors_impl.InvalidArgumentError: Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Assign requires shapes of both tensors to match. lhs shape= [1,1,2048,1001] rhs shape= [1,1,2048,1000]

FrozenGene · September 19, 2018, 2:56am

I use resnetv2 provided by it. Your error I remember export_inference_graph.py have one parameter can control it. You could investigate it.

srkreddy1238 · September 19, 2018, 3:42am

@jjiang2cal

You may try this initial version of changes where I could compile Resnet_v2 via tensorflow frontend.

I am planning to PR it soon.

jjiang2cal · September 19, 2018, 9:55pm

Yes set the --labels_offset=1 flag when exporting inference graph solves this problem. Thanks.

jjiang2cal · September 19, 2018, 11:08pm

@srkreddy1238

Thanks for the quick commit!

When I tried tf slim models of resnet 50 v1 and v2 (https://github.com/tensorflow/models/tree/master/research/slim), I got NotImplementedError: Please freeze the graph with add_shapes=True. I use the freeze script from https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py, and it does not have an add_shapes option. Is there any other freeze_graph I should use?

Sorry for bothering you so much

srkreddy1238 · September 20, 2018, 4:23am

freeze_graph.py --input_saved_model_dir=20180601_resnet_v2_imagenet_savedmodel/1527888387/ --output_graph=frozen_model-v2-fp16.pb --output_node_names=ArgMax --clear_devices

I use this command to freeze the model.

Ref.

helper function as shown below can be used to add shapes.

graph_def = nnvm.testing.tf.AddShapesToGraphDef(‘softmax’)

jjiang2cal · September 20, 2018, 5:41pm

@srkreddy1238

This is the official model I first used. The input node and output node of this model (inspected by https://github.com/tf-coreml/tf-coreml/blob/master/utils/inspect_pb.py) are:

0: op name = import/input_tensor, op type = ( Placeholder ), inputs = , outputs = import/input_tensor:0
@input shapes:
@output shapes:
name = import/input_tensor:0 : (128, 224, 224, 3)
......
 702: op name = import/ArgMax, op type = ( ArgMax ), inputs = import/resnet_model/final_dense:0, import/ArgMax/dimension:0, outputs = import/ArgMax:0
@input shapes:
name = import/resnet_model/final_dense:0 : (128, 1001)
name = import/ArgMax/dimension:0 : ()
@output shapes:
name = import/ArgMax:0 : (128,)

Since NNVM/TVM does not support batch_size > 1, so I set batch_size to 1.
With you patch, it compiled to nnvm successfully .
But I have questions of inference. The output of the graph is ArgMax, so it is the class number of the classification. And if batch_size is 1, the output shape should be (1,). I used the code below to inference:

......
out = module.get_output(0)
tvm_out = out.asnumpy()
print(tvm_out)

It printed out [766 774 457 766 729 701 824]
while I expected 230. And I don’t understand why it is a 7-element vector.

Below is one picture I used for inference. It is from the ImageNet dataset, classified as 230: ‘Shetland sheepdog, Shetland sheep dog, Shetland’.

ILSVRC2012_val_00000003

Do you have any insights how I should do the inference? Thanks a lot.

jjiang2cal · September 20, 2018, 6:05pm

@srkreddy1238

For the research slim model,
graph_def = nnvm.testing.tf.AddShapesToGraphDef('resnet_v2_50/predictions/Reshape_1')
does solve the add shape error. But during conversion,

File "from_tensorflow_slim_v2.py", line 124, in <module>
    graph, lib, params = nnvm.compiler.build(sym, target, shape_dict, params=params)
  File "/tvm/nnvm/python/nnvm/compiler/build_module.py", line 270, in build
    ishape, _ = graph_util.infer_shape(graph, **shape)
  File "/tvm/nnvm/python/nnvm/compiler/graph_util.py", line 31, in infer_shape
    graph = graph.apply("InferShape")
  File "/tvm/nnvm/python/nnvm/graph.py", line 234, in apply
    check_call(_LIB.NNGraphApplyPasses(self.handle, npass, cpass, ctypes.byref(ghandle)))
  File "/tvm/nnvm/python/nnvm/_base.py", line 75, in check_call
    raise NNVMError(py_str(_LIB.NNGetLastError()))
nnvm._base.NNVMError: Error in operator resnet_v2_50/SpatialSqueeze: [17:58:16] /tvm/nnvm/src/top/tensor/transform.cc:693: Check failed: shp[i] == 1 (7 vs. 1) The squeezed axis must have shape 1!Want to squeeze 2, which has shape7

The input, output and the resnet_v2_50/SpatialSqueeze nodes are as below:

0: op name = import/input, op type = ( Placeholder ), inputs = , outputs = import/input:0
@input shapes:
@output shapes:
name = import/input:0 : (?, 224, 224, 3)
......
1762: op name = import/resnet_v2_50/SpatialSqueeze, op type = ( Squeeze ), inputs = import/resnet_v2_50/logits/BiasAdd:0, outputs = import/resnet_v2_50/SpatialSqueeze:0
@input shapes:
name = import/resnet_v2_50/logits/BiasAdd:0 : (?, 1, 1, 1001)
@output shapes:
name = import/resnet_v2_50/SpatialSqueeze:0 : (?, 1001)
......
1767: op name = import/resnet_v2_50/predictions/Reshape_1, op type = ( Reshape ), inputs = import/resnet_v2_50/predictions/Softmax:0, import/resnet_v2_50/predictions/Shape:0, outputs = import/resnet_v2_50/predictions/Reshape_1:0
@input shapes:
name = import/resnet_v2_50/predictions/Softmax:0 : (?, 1001)
name = import/resnet_v2_50/predictions/Shape:0 : (2,)
@output shapes:
name = import/resnet_v2_50/predictions/Reshape_1:0 : (?, 1001)

jjiang2cal · September 20, 2018, 6:37pm

@FrozenGene

Did you have

File "/tvm/nnvm/python/nnvm/frontend/coreml.py", line 182, in PoolingLayerParams
    raise NotImplementedError("Other convolution padding not implemented")
NotImplementedError: Other convolution padding not implemented

when converting coreml to nnvm? (The coreml model is converted from research slim resnet50 v2 tf model.)

FrozenGene · September 21, 2018, 5:35pm

I have done many things for CoreML. For convolution, I have support SAME / VALID using 4-D padding (haven’t contributed back to community, will do soon) And for pooling, also support its padding completely too. So, I really cam not figure out the detail error only having this information. I suggest converting .mlmodel to Text format, you could Google it how to do it and then check what is this layer detail information.

srkreddy1238 · September 22, 2018, 4:37am

@jjiang2cal

I know the shape operator issue above resnet_v2_50, I will try sharing the fix for it soon.

srkreddy1238 · October 24, 2018, 11:47am

With this tensorflow frontend could support all models(Inception, Resnet, MobilenetV1/V2, Vgg) from research/slim.

As all models can’t be integrated into TVM test cases. I have added some utils to validate https://github.com/srkreddy1238/dmlc_data/tree/master/work/tf/samples for reference.