How to implement Conv2d for ResNet-50 in TVM

richard-wwu · June 4, 2019, 1:12pm

I am trying to measure TVMs performance for the conv2d operator for the following 9 input sizes, as used in ResNet-50:

[batch, in_height, in_width, in_channels], [filter_height, filter_width, in_channels, out_channels]
[128    16         16        32]           [3              3             32           32]
[128    16         16        32]           [3              3             32           32]
[128    8          8         64]           [3              3             64           64]
[128    32         32        3]            [3              3             3            16]
[128    34         34        16]           [3              3             16           32]
[128    32         32        16]           [1              1             16           16]
[128    18         18        32]           [3              3             32           64]
[128    32         32        16]           [1              1             16           32]
[128    16         16        32]           [1              1             32           64]

I found the following tutorial on how to implement and tune conv2d for NVIDIA GPUs: https://docs.tvm.ai/tutorials/autotvm/tune_conv2d_cuda.html. Unfortunately, the tutorial code does not support batch sizes larger than 1 and stores the input buffer in the layout

[batch, in_channels, in_height, in_width], [out_channels, in_channels, filter_height, filter_width]

instead of

[batch, in_height, in_width, in_channels], [filter_height, filter_width, in_channels, out_channels]

Is it possible to adapt the tutorial code to match ResNet-50’s batch size and buffer layout? Additionally, is this the best available conv2d implementation that should be used for comparisons with TVM?

richard-wwu · June 8, 2019, 11:14am

@tqchen @thierry Do you have any ideas how to solve this issue?

tqchen · June 8, 2019, 4:22pm

The best way to run comparison would still be running the end to end benchmarks, see e.g. https://github.com/dmlc/tvm/blob/master/apps/benchmark/gpu_imagenet_bench.py

richard-wwu · June 9, 2019, 3:17pm

@tqchen: thank you, but right now, we are interested in TVM’s performance on the operator level. Any suggestions?

tqchen · June 9, 2019, 6:59pm

The end to end code did use operator code generated by autotvm. So if you trace the code that is being run, likely you can find these operators