When I try to import BERT model from ONNX relay, I encounter so many problems.
First, it said that The following operators are not supported for frontend ONNX: Expand
and I fixed it according to https://discuss.tvm.ai/t/tvm-error-opnotimplemented-the-following-operators-are-not-supported-for-frontend-onnx-expand/2688
(but it still not in the latest version).
Then, it said that Check failed: e_dtype == dtype (int64 vs. int32) : relay.concatenate requires all tensors have the same dtype
and I found someone had already complained about that in https://discuss.tvm.ai/t/relay-onnx-load-resnet-onnx-to-relay-failed/2411
and a very kind person provides a solution https://github.com/dmlc/tvm/pull/3230
but it stills not merged so I tried to modified my code first.
After that, I encounter duplicate argument supplied in both positional args (at position: 0), and keyword argument (with name: 6)
and it seems that no one have ever faced this problem.
Could you kindly provides some kind of tutorial to import a BERT model from ONNX? It is quite common to fine tune a BERT model to do things.
I went through the process of importing Googleās BERT version (TensorFlow) into TVM. Generally, the process involved the following steps:
- Export and freeze BERT, as described here. This involves building TensorFlow, so it might take some time.
- Import frozen BERT model into TVM.
- Address unsupported operators. This involves implementing the required operators. In my case, I implemented dummy operators which returned a tensor of the correct shape/dtype, but not the correct values. Youāll have to do more work to implement the operators correctly.
It sounds like the TensorFlow route is potentially less error-prone. I personally didnāt encounter any problems other than having to implement the operators.
Good luck! If Iām able to open-source this code at some point, Iāll post here.
UPDATE:
Iām not open-sourcing this code, but thereās not much to be open-sourced anyway. The gist is this: build the freeze-model
tool, load a version of BERT in TensorFlow and serialize it out to a .pb
or .pbtxt
, freeze that (using a checkpoint from Googleās BERT GitHub to get the trained model parameters), and then you can load the frozen .pb
into TVM.
Importing BERT from TensorFlow Frozen PB is OK. Currently only āStopGradientā and āAssertā OP in TF are not supported in TF-Relay converting. However, itās easy to work around for inference.
(1) For StopGradient, convert it to identity() (2) For Assert, I removed that from PBļ¼same logic could be done in outer code)
Inference results are same as TF, however, performance is much poorer.
you āperformanceā, do you mean the time or the accuracyļ¼
Sorry for the ambiguous word.
My āperformanceā means āinference latencyā, quite slow on NV GPU; Accuracy is fine, same as TF PB
Actually, I have been running TF BERT for a while now with great results, I was mostly asking about ONNX BERT