Using Relay to Add Operations to Graph

I’m very interested in using Relay’s graph quantization feature but it currently can’t be applied to my model as I need the last few convolution layers to be in floating point. It’s easy enough to prevent relay from quantizing those layers by tweaking the annotate functions, however, the last quantized layer outputs in NCHWc and the floating point convolution requires NCHW. Although slipping in a transpose and reshape could solve this, it can’t be done during annotate since shape conversion doesn’t happen until quantization is realized. I’m curious what the easiest way to insert two operations in the graph is. Is the only solution creating a new IR pass that identifies where to insert the layout change?

Is your original model in NCHW? During AlterOpLayout pass, some ops will be converted to NCHWc, layout transformation will be automatically inserted whenever conversion to/from NCHWc is needed.

Yes, its a fully convolutional model with all layers in NCHW. It seems like AlterOpLayout is missing something as I consistently get the error:

TVMError: Check failed: false: Incompatible broadcast dims: 224 and 64 in: [1, 16, 224, 224, 4] and [1, 64, 1, 1]

During the graph_gen.codegen. Adding an additional AlterOpLayout pass after quantize realization doesn’t help. Although this looks like it’s having trouble adding in a bias, it’s actually somewhat difficult to figure out exactly where the shape mismatch is happening. Do you know the right way to inspect the graph after all ir passes have taken place?

you can use sym = relay.ir_pass.infer_type(sym) to infer types and print(sym)

Maybe there’s something I’m missing here, I’ve printed this out after running quantize on the graph but the shapes it produces don’t appear to be in NCHWc. For example, here’s the a convolution from one of the inner layers.

    `%185 = nn.conv2d(%184, meta[relay.Constant][30] /* ty=Tensor[(64, 64, 3, 3), int8] */ /* ty=Tensor[(64, 64, 3, 3), int8] */, channels=64, kernel_size=[3, 3], out_dtype="int32") /* ty=Tensor[(1, 64, 224, 224), int32]`

To find the shape mismatch error I need to see what it looks like after the NCHWc conversion.

Quantization doesn’t modify op layout. AlterOpLayout is only applied when you build with opt_level=3

Even when I print the graph directly before calling graph_gen.codegen (which happens during build), the shapes still are all in NCHW. Similarly applying sym = relay.ir_pass.alter_op_layout after quantization doesn’t seem to change shapes either. Is there no way from python to see the shapes produced?

you can modify relay/build_module.py and print the shape after alter_op_layout pass

Here’s a sample of whats printed from doing that.

   %59 = nn.conv2d(%58, meta[relay.Constant][10] /* ty=Tensor[(64, 64, 3, 3), int8] */ /* ty=Tensor[(64, 64, 3, 3), int8] */, channels=64, kernel_size=[3, 3], out_dtype="int32")
  %60 = add(%59, meta[relay.Constant][11] /* ty=Tensor[(1, 64, 1, 1), int32] */ /* ty=Tensor[(1, 64, 1, 1), int32] */)

Clearly this is an integer convolution, but the shape of the bias is still incompatible. Does this suggest the bias is just being missed from the alteroplayout pass?

It’s possible some type info is missing after the pass. You can add sym = relay.ir_pass.infer_type(sym) before printing

Adding an extra infer_type pass doesnt change the output. If everything were working we should see that the add would be with a tensor of shape [1, 16, 1, 1, 4] right?

Yes, if conv2d is in NCHWc

Maybe this is the source of the problem. I’ve been quantizing and building the graph using the Int8Fallback scope, which sets the cfg.template_key to be int8. I was under the impression this is how we tell tvm to use the NCHWc convolution where appropriate. However, I see in the example from your blog post that you don’t actually set cfg anywhere. Does relay make this conversion when template_key isnt set? You also mention that we do need to set it during autotuning. Why is it different in that case?

You will put relay.build under with autotvm.apply_history_best(log_path) context. TOPI will choose the template key that have best performance according to the log.

During tuning, you need to set template key to obtain logs for NCHWc operators.