AssertionError when compiling some quantized models on GPU at opt level 0

Issue description

Compilation of some pretrained GluonCV models at opt_level 0 on GPU after quantizing them fails with AssertionError “Number of output channels should be multiple of 4”.

Steps to reproduce the issue

  1. Prepare hardware and environment that meet the requirements for TVM on an NVIDIA GPU
  2. Install MXNet 1.5.1, GluonCV 0.7.0, and the latest MKL-DNN library
  3. Build TVM from master commit 0ea99698 with USE_MKLDNN ON
  4. Download pretrained model ssd_512_vgg16_atrous_voc from GluonCV with gluoncv.model_zoo.get_model()
  5. Convert the model to a TVM Relay graph with tvm.relay.frontend.from_mxnet()
  6. Quantize the model with tvm.relay.quantize.quantize()
  7. Compile the graph with tvm.relay.build() at opt_level 0

What’s the expected result?

  • Compilation succeeds

What’s the actual result?

  • Compilation fails with the following AssertionError:

    File "/usr/tvm/topi/python/topi/cuda/conv2d_int8.py", line 101, in conv2d_NCHWc_int8
        oc_block_factor)
    AssertionError: Number of output channels should be multiple of 4
    

Additional details

  • Confirmed for models ssd_512_vgg16_atrous_voc and yolo3_darknet53_voc
  • Confirmed for TVM commits 43dcbc6b and 0ea99698
  • The error occurs because the schedule conv2d_NCHWc_int8 expects the size of the first dimension of a particular INT8 input tensor to be a multiple of 4, but the relevant tensor has shape (126, 1024, 3, 3)
  • For opt_level > 0 this tensor has shape (128, 1024, 3, 3), hence passing the check in conv2d_NCHWc_int8
  • The shape of the relevant tensor in the quantized network before compilation is (126, 1024, 3, 3)
  • For TVM master commits later than 0ea99698 quantization fails due to the issue Quantization fails with recent master commits of TVM

Suggested solutions

  • Fix compilation in TVM so as to support opt_level 0 on GPU for quantized pretrained GluonCV models ssd_512_vgg16_atrous_voc and yolo3_darknet53_voc