[TF] Kernel Layout: HWIO vs. HWOI

In our NNVM/Relay model importers, we appear to be using a double layout standard when it comes to kernel layout:

It appears that we force the HWOI format when we encounter a depthwise_conv2d - however TF documentation makes it clear that the format should be [filter_height, filter_width, in_channels, channel_multiplier] as https://www.tensorflow.org/api_docs/python/tf/nn/depthwise_conv2d

I propose to fix this current issue and enforce HWIO across the board when we import a FT model by default. This problem was hinted at in a former discuss forum post: Issue in depthwise convolution

@srkreddy1238 @tqchen @zhiics @yzhliu @grwlf @siju-samuel @apivovarov

Should be resolved by https://github.com/dmlc/tvm/pull/3676/files

And I think it should not be problem. This is the issue of our impl. We don’t handle correctly between depthwise conv and grouped conv before this pr. In fact, tflite also has two different kernel layout for conv / depthwise conv. The special place of depthwise conv kernel is we have KH KW C M, the correct if strictly speaking, kernel layout should be HWOM.

Thanks for the clarification @FrozenGene; but correct me if I’m wrong; shouldn’t HWOI be interpreted in the case of depthwise conv as HWMC which contradicts the TF layout requirements?

I found these lines: https://github.com/dmlc/tvm/blob/master/python/tvm/relay/frontend/tflite.py#L672-L674 to be in contradiction with these lines: https://github.com/dmlc/tvm/blob/master/python/tvm/relay/frontend/tflite.py#L656-L659

@thierry TFLite’s depthwise conv kernel layout is not the same as TF. They are different. Normal conv kernel layout of TFLite is also different with TF.

For TFLite, we obey the definition here: https://github.com/dmlc/tvm/blob/master/topi/python/topi/nn/depthwise_conv2d.py#L131. As you say in NCHW https://github.com/dmlc/tvm/blob/master/topi/python/topi/nn/depthwise_conv2d.py#L60, For NCHW, we always OIHW. However, but for depthwise convolution, the O value equals to input tensor’s channel, i.e. ic.

P.S.
I must apologize that my TFLite’s implementation for depthwise convolution is not correct for depthwise convolution’s multiplier greater than 1, TFLite’s depthwise convolution kernel layout is not I commented there in fact(TFLite depthwise kernel layout is 1xKHxKWx(input_channel * multiplier). The issue I have known when I handle https://github.com/dmlc/tvm/pull/3676. I will support TFLite depthwise convolution multiplier greater than 1 in several days. But again, TFLite’s conv / depthwise convolution kernel layout is not the same as TF.

Thank you for the explanation; I must apologize for not making it clear that I was referring initially to TFLite instead of TF; I did not realize that the layout requirements were different between the two frameworks.

And fixing the dephwise convolution would be great, I’ll keep an eye out for the PR!

Also, perhaps as a way to prevent further confusion, what do you think of the following naming convention for dephwise convolution: HWIM and MIHW?

Yes. I think so. However, the question is worthy doing it or we could reuse current way and comment what happened.

Hi, I’m a little bit confused about the decision to use HWOI (as opposed to HWIO) as a kernel layout for an NHWC depthwise convolution. For NCHW we use OIHW to represent TFLite’s layout of MIHW. However, for NHWC we use HWOI to represent TFLite’s layout of HWIM. As you can see in the case of NCHW we map TFLite’s representation of “M” to “O” and “I” to “I”, but in the case of NHWC we map “M” to “I” and “I” to “O”?

This also means that we are able to use the same kernel layout for both convolution and depthwise convolution in NCHW, but we use 2 different kernel layouts for NHWC?

Would it be possible to clarify?

Gentle bump. It would be really useful to understand this decision because it complicates interacting with 3rd party libraries. For example, ACL expects convolution weights in the format OHWI, and depthwise weights in the format HWI, with the depth multiplier value being passed separately.

Now in ACL’s case we currently use the convert layout pass to convert the weights to the expected format. We can specify our desired layout {'nn.conv2d':['NHWC', 'OHWI']}, although this inconsistency means that the format we actually get converting a depthwise convolution is IHWM rather than the expected MHWI. In order to get this to work correctly I would need to specify: {'nn.conv2d':['NHWC', 'IHWO']}, but this would then cause issues for a normal convolution. Currently the convert layout pass has no logic to distinguish between a ‘normal’ convolution or a depthwise one.

I’m trying to understand whether improvements need to be made to the convert layout pass - to distinguish between these types. Or alternatively, I couldn’t see any reason in the code that ties us to using the OIHW format other than preference and ease of checking for a depthwise convolution later on. Maybe I’m really incorrect about this though.

Thanks, any pointers would be a great help :slight_smile: