Issue description
Compilation of some pretrained GluonCV models at opt_level 0 on GPU after quantizing them fails with AssertionError “Number of output channels should be multiple of 4”.
Steps to reproduce the issue
- Prepare hardware and environment that meet the requirements for TVM on an NVIDIA GPU
- Install MXNet 1.5.1, GluonCV 0.7.0, and the latest MKL-DNN library
- Build TVM from master commit
0ea99698
with USE_MKLDNN ON - Download pretrained model
ssd_512_vgg16_atrous_voc
from GluonCV withgluoncv.model_zoo.get_model()
- Convert the model to a TVM Relay graph with
tvm.relay.frontend.from_mxnet()
- Quantize the model with
tvm.relay.quantize.quantize()
- Compile the graph with
tvm.relay.build()
at opt_level 0
What’s the expected result?
- Compilation succeeds
What’s the actual result?
-
Compilation fails with the following AssertionError:
File "/usr/tvm/topi/python/topi/cuda/conv2d_int8.py", line 101, in conv2d_NCHWc_int8 oc_block_factor) AssertionError: Number of output channels should be multiple of 4
Additional details
- Confirmed for models
ssd_512_vgg16_atrous_voc
andyolo3_darknet53_voc
- Confirmed for TVM commits
43dcbc6b
and0ea99698
- The error occurs because the schedule
conv2d_NCHWc_int8
expects the size of the first dimension of a particular INT8 input tensor to be a multiple of 4, but the relevant tensor has shape (126, 1024, 3, 3) - For opt_level > 0 this tensor has shape (128, 1024, 3, 3), hence passing the check in
conv2d_NCHWc_int8
- The shape of the relevant tensor in the quantized network before compilation is (126, 1024, 3, 3)
- For TVM master commits later than
0ea99698
quantization fails due to the issue Quantization fails with recent master commits of TVM
Suggested solutions
- Fix compilation in TVM so as to support opt_level 0 on GPU for quantized pretrained GluonCV models
ssd_512_vgg16_atrous_voc
andyolo3_darknet53_voc