Layout conversion pass

Right now we use AlterOpLayout pass that automatically decides which layout based on the target hw back ends.

Given that we also want to offer the flexibility to pragmatically add pass pipelines, and there has been increasing need for converting between layouts(e.g. NHWC to NCHW), we might want to also introduce a Layout conversion pass that a user can specify. This would provide additional optional flexibility that some of our current frontends need Example usage:

mod_nhwc = relay.from_tflite(model)
mod_nchw = relay.transform.ConvertLayout("NCHW", "NHWC")(mod_nhwc)

Let us use this thread to discuss the API choices, possible implementation problems.

cc @yzhliu @FrozenGene @merrymercy @zhiics

4 Likes

That would be very convenient for a lot of reasons (e.g. for now X86 and ARM is lacking NHWC layout support).

One question regarding the layout - what are the pros/cons of explicitly specifying the layout with a relay transform pass, v.s. inferring it from the layout of the input shape (when using most relay importers, we’ll need to pass a shape dict; would there be a benefit from enforcing the layout from the shape?)

This will be very useful feature. Not sure if its clean, but this can also bind in very well with target-specific AlterOpLayout pass, where we do not have to handle all different types of layout transforms.

I like the overall API that is mentioned. I think the implementation might be little more complicated that might require many individual passes (very similar to AlterOpLayout infra). For example,

conv(layout="NHWC")
sum(axis=3)
conv(layout="NHWC")
sum(axis=3)

needs a mechanism to mutate sum attributes so that we can have

transpose(NHWC->NCHW)
conv(layout="NCHW")
sum(axis=1)
conv(layout="NCHW")
sum(axis=1)
transpose(NCHW->NHWC)

Great discussions so far, would be great to see if we can have someone who can volunteer to lead the charge.

For the layout transformation, I think we should care one thing is how we do the work. For example, current TF frontend do this work just be:
insert transpose(NHWC->NCHW) -> conv(NCHW)-> insert transpose(NCHW->NHWC)
I think we don’t want this way because of transpose cost.

Next, I want to express some point in frontend / backend

  • Frontend

So if we want one Relay pass to do it , we should be careful how long the transpose path could be exist? The ideal path is:

mod_nchw = relay.transform.ConvertLayout("NCHW", "NHWC")(mod_nhwc)

->

input (NHWC) -> insert transpose(NHWC->NCHW) -> model ops -> insert transpose (NCHW->NHWC)->output
That is to say we just insert two transpose at the begin / end place so that the users could use the model like NHWC without any difference. We could discuss whether we should make output be back to NHWC.

However, we also must careful some ops, for example squeeze / reshape and so on, original model’s axis information is just NHWC.

  • Backend

We have different backend, such as x86, arm, NV GPU, Intel GPU, arm mali. NCHW doesn’t be the best layout for all backends. For example, we observe NHWC maybe one better layout for arm cpu. When we load TFLite model, how do we set the data layout? Proving one parameter like TF frontend (from_tensorflow) or what else?

exactly, the layout transformation pass need to do some heavy lifting to avoid doing excessive transformations.

For each frontend, I think we could start with a default layout used by the frontend, and allow user to pragmatically call the ConvertLayout pass to change the layout to the desired one.

I’m working on the layout inference pass (though the progress was interrupted time by time)
Regarding @thierry 's point of enforcing layout from the shape, I also would like to make it depend on InferShape pass, as in some cases shape are required, e.g., broadcast

Does this work for Keras? I’m trying to use the Relay IR to generate cifar10, but it always outputs the conv2d NCHW layers in NHWC format. To further complicate things, the Relay frontend only supports “channels_first (NCHW)” format while the tensorflow backend only supports “channels_last (NHWC)” so the implementation of wrapping a conv layer in transpose layers is not going to cut it.

@tqchen, I wrote a quick implementation here. Was wondering if you are also thinking in same direction

# FIXME - Make it a pass
def ConvertLayout(mod, src_layout, dst_layout):
    from tvm.relay.op import register_alter_op_layout
    def run_opt_pass(mod, passes):
       passes = passes if isinstance(passes, list) else [passes]
       seq = Sequential(passes)
       with PassContext(opt_level=3):
           mod = seq(mod)
       return mod

    # FIXME - How to deal with AlterOpLayout registry
    @register_alter_op_layout("nn.conv2d", level=114)
    def alter_conv2d(attrs, inputs, tinfos):
        data_layout = attrs['data_layout']
        kernel_layout = attrs['kernel_layout']
        data, weight = inputs
        if data_layout == 'NHWC' and kernel_layout == 'HWIO':
            new_attrs = dict(attrs)
            new_attrs['data_layout'] = 'NCHW'
            new_attrs['kernel_layout'] = 'OIHW'
            return relay.nn.conv2d(data, weight, **new_attrs)
        return None

    assert src_layout == "NHWC" and dst_layout == "NCHW", \
            "Supports only NHWC to NCHW layout transformation"

    # Use Sequential
    b_mod = run_opt_pass(mod, CanonicalizeOps())
    b_mod = run_opt_pass(b_mod, AlterOpLayout())
    b_mod = run_opt_pass(b_mod, FoldConstant())
    print("######## Before ###########")
    print(mod)
    print("######## After ###########")
    print(b_mod)
    return b_mod

I tried this on InceptionV3 tflite and it resulted in only 2 layout transforms. The only concern (and maybe big concern) is AlterOpLayout pass in the registry. In this pass, I am registering a new PackedFunc for nn.conv2d, it will overwrite all target specifics AlterOpLayout, preventing the layout optimizations when we call relay.build().

The reason that there are only 2 layout transforms in the whole network is that all the other operators like Pool, Reduce. Concatenate, Broadcast etc try to adapt to their input data layout. I made a couple of PRs last month to do that, which are already merged. So, if we just force the conv2d layout, most of the operators will automatically change layout.

1 Like

@tqchen Ping when you time :slight_smile:

I think as long as we have an explicit ConvertLayout pass that developers can call, we are good. It would be great to add coverage tests for most combinations of src_layout and dst_layout.

@yzhliu given that worked on this before, it would be great to get your thoughts

I think it’s good enough except the registry as @janimesh mentioned. would it be proper to introduce “deregister” approach so that things get “normal” once ConvertLayout finishes?

if the registration is temporary, I think we should do it in the other way around. Create a generic layout rewrite base utils that can be used by ConvertLayout and AlterOpLayout

PR - https://github.com/apache/incubator-tvm/pull/4335