Problems when import BERT model from ONNX relay

When I try to import BERT model from ONNX relay, I encounter so many problems.
First, it said that The following operators are not supported for frontend ONNX: Expand and I fixed it according to (but it still not in the latest version).
Then, it said that Check failed: e_dtype == dtype (int64 vs. int32) : relay.concatenate requires all tensors have the same dtype and I found someone had already complained about that in and a very kind person provides a solution but it stills not merged so I tried to modified my code first.
After that, I encounter duplicate argument supplied in both positional args (at position: 0), and keyword argument (with name: 6) and it seems that no one have ever faced this problem.
Could you kindly provides some kind of tutorial to import a BERT model from ONNX? It is quite common to fine tune a BERT model to do things.

I went through the process of importing Google’s BERT version (TensorFlow) into TVM. Generally, the process involved the following steps:

  1. Export and freeze BERT, as described here. This involves building TensorFlow, so it might take some time.
  2. Import frozen BERT model into TVM.
  3. Address unsupported operators. This involves implementing the required operators. In my case, I implemented dummy operators which returned a tensor of the correct shape/dtype, but not the correct values. You’ll have to do more work to implement the operators correctly.

It sounds like the TensorFlow route is potentially less error-prone. I personally didn’t encounter any problems other than having to implement the operators.

Good luck! If I’m able to open-source this code at some point, I’ll post here.

I’m not open-sourcing this code, but there’s not much to be open-sourced anyway. The gist is this: build the freeze-model tool, load a version of BERT in TensorFlow and serialize it out to a .pb or .pbtxt, freeze that (using a checkpoint from Google’s BERT GitHub to get the trained model parameters), and then you can load the frozen .pb into TVM.

1 Like

@boood15 did you ever get this figured out? Which ops did you have to implement?

Importing BERT from TensorFlow Frozen PB is OK. Currently only “StopGradient” and “Assert” OP in TF are not supported in TF-Relay converting. However, it’s easy to work around for inference.
(1) For StopGradient, convert it to identity() (2) For Assert, I removed that from PB(same logic could be done in outer code)

Inference results are same as TF, however, performance is much poorer.

you “performance”, do you mean the time or the accuracy?

Sorry for the ambiguous word.
My “performance” means “inference latency”, quite slow on NV GPU; Accuracy is fine, same as TF PB

Actually, I have been running TF BERT for a while now with great results, I was mostly asking about ONNX BERT :slight_smile: