Problems when import BERT model from ONNX relay

When I try to import BERT model from ONNX relay, I encounter so many problems.
First, it said that The following operators are not supported for frontend ONNX: Expand and I fixed it according to https://discuss.tvm.ai/t/tvm-error-opnotimplemented-the-following-operators-are-not-supported-for-frontend-onnx-expand/2688 (but it still not in the latest version).
Then, it said that Check failed: e_dtype == dtype (int64 vs. int32) : relay.concatenate requires all tensors have the same dtype and I found someone had already complained about that in https://discuss.tvm.ai/t/relay-onnx-load-resnet-onnx-to-relay-failed/2411 and a very kind person provides a solution https://github.com/dmlc/tvm/pull/3230 but it stills not merged so I tried to modified my code first.
After that, I encounter duplicate argument supplied in both positional args (at position: 0), and keyword argument (with name: 6) and it seems that no one have ever faced this problem.
Could you kindly provides some kind of tutorial to import a BERT model from ONNX? It is quite common to fine tune a BERT model to do things.

I went through the process of importing Googleā€™s BERT version (TensorFlow) into TVM. Generally, the process involved the following steps:

  1. Export and freeze BERT, as described here. This involves building TensorFlow, so it might take some time.
  2. Import frozen BERT model into TVM.
  3. Address unsupported operators. This involves implementing the required operators. In my case, I implemented dummy operators which returned a tensor of the correct shape/dtype, but not the correct values. Youā€™ll have to do more work to implement the operators correctly.

It sounds like the TensorFlow route is potentially less error-prone. I personally didnā€™t encounter any problems other than having to implement the operators.

Good luck! If Iā€™m able to open-source this code at some point, Iā€™ll post here.

UPDATE:
Iā€™m not open-sourcing this code, but thereā€™s not much to be open-sourced anyway. The gist is this: build the freeze-model tool, load a version of BERT in TensorFlow and serialize it out to a .pb or .pbtxt, freeze that (using a checkpoint from Googleā€™s BERT GitHub to get the trained model parameters), and then you can load the frozen .pb into TVM.

1 Like

@boood15 did you ever get this figured out? Which ops did you have to implement?

Importing BERT from TensorFlow Frozen PB is OK. Currently only ā€œStopGradientā€ and ā€œAssertā€ OP in TF are not supported in TF-Relay converting. However, itā€™s easy to work around for inference.
(1) For StopGradient, convert it to identity() (2) For Assert, I removed that from PBļ¼ˆsame logic could be done in outer code)

Inference results are same as TF, however, performance is much poorer.

you ā€œperformanceā€, do you mean the time or the accuracyļ¼Ÿ

Sorry for the ambiguous word.
My ā€œperformanceā€ means ā€œinference latencyā€, quite slow on NV GPU; Accuracy is fine, same as TF PB

Actually, I have been running TF BERT for a while now with great results, I was mostly asking about ONNX BERT :slight_smile: