Does TVM support autotvm for depthwise_conv2d dilation op using Opencl?

dolphintear · September 25, 2019, 4:45am

does tvm fully support for depthwise_conv2d autotun dilation op？ thanks a lot !

FrozenGene · September 25, 2019, 5:11am

you could refer arm_cpu’s implementation to implement : https://github.com/dmlc/tvm/blob/master/topi/python/topi/arm_cpu/depthwise_conv2d.py#L192

dolphintear · September 25, 2019, 6:08am

thanks a lot!
since the tensorflow decompose the depthwise dilated conv2d into three steps: 1. SpaceToBatchND 2.DepthWiseConv2d 3. BatchToSpaceND , tensorflow frontend already implemented the 1 and 3 part, as long as opencl can do the depthwise_conv2d op, it can do the autotune of depthwise conv2d with different dilated rates.

or refer to topi/python/topi/cuda/depthwise_conv2d.py?

FrozenGene · September 25, 2019, 6:12am

What you say is another story. Some framework will combine SpaceToBatchND + depthwise + BatchToSpaceND into depthwise (dilation > 1). If you see these three ops, depthwise shouldn’t be problem. You should optimize SpaceToBatchND / BatchToSpaceND. But according to my expr, depthwise (dilation > 1) is better, some converter tool will do this combination, for example TF->CoreML converter.

dolphintear · September 25, 2019, 7:11am

Thanks a lot ! I got it!
There is another problem related to the deployment on Windows, I got the .log generated after autotune on Ubuntu using two threads. But when I used this .log to deploy on windows, found that it just used one thread when doing the inference, in this case, I should do the autotune again using two threads on windows in order to double the inference speed?

FrozenGene · September 25, 2019, 7:17am

This is not related with auto tune log. You should dig into why windows could only use one thread.