[TFLite] OHWI kernel layout for 2D convolution

TFLite exports weights in OHWI format which is its default layout. With its current implementation, the TFLite parser transposes the kernel so the layout becomes OIHW. This is presumably because TVM doesn’t fully support OHWI.

This becomes an issue for Arm Compute Library (and presumably other external codegen libraries) since it supports OHWI weights by default. One way to overcome this is to transpose the kernel back to OHWI format before passing them to ACL. This results in 2 different conversions: OHWI->OIHW->OHWI. However, now that external codegen has landed, I don’t believe it should be upto the parser to convert to a supported format specifically for TVM. Instead it should be taken care of lower down the stack, after the graph has been annotated and partitioned. Are there any thoughts on this?

Though we have external codegen, TVM codegen should still be our main / focus backend, we shouldn’t put the cart before the horse.

For TFLite parser, Relay requires NHWC’s kernel layout be HWIO, so we transpose TFLite’s OHWI so that we meet this requirement. However, don’t worry about it, weight is constant and our transpose won’t introduce runtime performance downgrade. So for ACL, I think weight transpose shouldn’t introduce runtime performance downgrade too. Another example is MKLDNN, which will try to transpose weight layout from NCHW to NCHW16c.

If we don’t do this transpose in the parser, we will modify many places to adapt this kernel layout OHWI / or we will introduce one pass to handle it. Any ways I don’t think it is better. I think for our TVM, we should require our NCHW data layout correspond to OIHW kernel layout, NHWC data layout correspond to HWIO kernel layout. Except TFLite, Tensorflow’s kernel layout is HWIO, which is the same as TVM, MXNet’s kernel layout is HWIO, cudnn’s kernel layout is HWIO too. So, for TVM’s NHWC, we requires HWIO and we could meet many framework’s requirement. So, one question for ACL, if ACL accepts Tensorflow’s model and only supports OHWI, it still need transpose. If ACL supports HWIO, it is fine, TVM’s TFLite parser has done for you.

So I think for TVM’s TFLite, we should keep current way, TVM is still our main / focus codegen backend. ACL’s weight transpose shouldn’t introduce performance downgrade. If ACL could accpet tensorflow’s model, ACL should support HWIO too. If so, which has been done by TVM’s TFLite parser and could be used directly.

If I understand correctly, there are 2 key issues that you are dealing with

  • Framework parser is hardcoded to a certain data layouts for both data and kernel. And these layouts do not match your external codegen desired input layouts.
  • Even if you add external layouts to match the external codegen layouts, you have to deal with these extra layout transforms.

I think there are 2 ways to deal with this systematically

  • Change the layout of whole graph - Instead of changing the data layout of each operator in the framework parser, use a Relay pass to convert the data layout of the whole network. We have a relay pass - ConvertLayout that can be used to do that - https://docs.tvm.ai/dev/convert_layout.html. However, we might need some mode code to enable the usecase that you want.

  • 2 layout transforms on weights - As long as it is a weight, you can always call FoldConstant Relay pass to get rid of those extra transforms.

The way I think of external codegen integration is that External codegen will take a Relay graph parsed by a framework parser (like TFLite parser in your case), and then run a couple of Relay passes to prepare it for the backend codegen. These Relay passes can be traditional ones like FoldConstant, DeadCodeElimination, or can also be very target specific, that modifies the graph in a very specific way (but still in Relay world). With this picture in mind, I think the above 2 passes should support your usecase.

@janimesh - I am not convinced it’s as simple as that.

  • Changing the layout for the whole graph assumes that the whole graph is offloaded to an external code gen. While that would work for simple examples, I’m not sure that works in the general case where some subset of the input graph goes to one external code gen framework while the rest falls back to TVM code generation. If only certain operators are supported by ACL , then I’d like to fall back to TVM code generation.

  • Should the pass manager that integrates for the external code gen be aware of exactly which frontend was being used ? What is the route to getting such information across frameworks other than checking what framework was used explicitly ?

regards Ramana

@ramana-arm Thanks for giving more background. I think the idea still applies on subgraphs for first point. But, please definitely correct me if I am oversimplifying.

So, lets say that TFLite-Relay parser parses TFLite network and produces a Relay graph. The partitioner/annotator looks what ops are supported by ACL and breaks it into 2 subgraphs - one ACL supported and other falling-back to TVM. Both of these will still be Relay subgraphs.

Now, I am proposing to have a sort of ACL-Relay frontend, that can run Relay passes. Relay calls ACL-Relay frontend with this ACL-supported subgraph only. Now, ACL-Relay frontend (aware of what needs to be done for ACL) converts the layout of only this subgraph to the ACL-supported layouts (while TVM subgraph remains in its original format). Similarly, FoldConstant can be run after converting layouts.

  • For the second point, is there any reason we need to keep frontend information in the subgraph?

Current TVM project design tries to remove frontend from the picture as soon as the framework-Relay parsers are called. For example, specifics to layouts in this discussion, the Relay subgraph will have the layouts annotated for each conv, and ACL-Relay frontend can rewrite conv with the layouts that ACL supports/prefers.

(I understand there might be some inefficiencies at the boundaries of subgraphs. Hopefully, this is not a bottleneck in early phases of ACL integration.)

Many thanks for the suggestions @FrozenGene @janimesh

I agree that preparing a subgraph for a particular backend should work for ACL integration and I can’t foresee any issues as of yet. Performance isn’t a concern since this is all happening during compilation, it was just more about unnecessary conversions to one layout and back again.

My question was mainly concerning the fact that it may not be appropriate to transform the kernel layout in the TFlite parser and assume a default layout. Instead, would it be better to let lower levels of the stack decide what to do after partitioning? On the other hand, I do understand the need for supporting TVM as a primary backend.