Problems when import BERT model from ONNX relay


#1

When I try to import BERT model from ONNX relay, I encounter so many problems.
First, it said that The following operators are not supported for frontend ONNX: Expand and I fixed it according to https://discuss.tvm.ai/t/tvm-error-opnotimplemented-the-following-operators-are-not-supported-for-frontend-onnx-expand/2688 (but it still not in the latest version).
Then, it said that Check failed: e_dtype == dtype (int64 vs. int32) : relay.concatenate requires all tensors have the same dtype and I found someone had already complained about that in https://discuss.tvm.ai/t/relay-onnx-load-resnet-onnx-to-relay-failed/2411 and a very kind person provides a solution https://github.com/dmlc/tvm/pull/3230 but it stills not merged so I tried to modified my code first.
After that, I encounter duplicate argument supplied in both positional args (at position: 0), and keyword argument (with name: 6) and it seems that no one have ever faced this problem.
Could you kindly provides some kind of tutorial to import a BERT model from ONNX? It is quite common to fine tune a BERT model to do things.


#2

I went through the process of importing Google’s BERT version (TensorFlow) into TVM. Generally, the process involved the following steps:

  1. Export and freeze BERT, as described here. This involves building TensorFlow, so it might take some time.
  2. Import frozen BERT model into TVM.
  3. Address unsupported operators. This involves implementing the required operators. In my case, I implemented dummy operators which returned a tensor of the correct shape/dtype, but not the correct values. You’ll have to do more work to implement the operators correctly.

It sounds like the TensorFlow route is potentially less error-prone. I personally didn’t encounter any problems other than having to implement the operators.

Good luck! If I’m able to open-source this code at some point, I’ll post here.

UPDATE:
I’m not open-sourcing this code, but there’s not much to be open-sourced anyway. The gist is this: build the freeze-model tool, load a version of BERT in TensorFlow and serialize it out to a .pb or .pbtxt, freeze that (using a checkpoint from Google’s BERT GitHub to get the trained model parameters), and then you can load the frozen .pb into TVM.