Hello there. I am looking at grouped convolution, and am incurring a massive performance penalty when using it.
In the Relay interface, there are different implementations for conv layers when groups==1
, and when groups>1
.
Ideally, we would hope for a speedup 2x if we switch from normal convolutional layer, to a grouped convolution with groups==2
. However, on several platforms I have tried, there is a ~4x slowdown.
I have been looking into the tvm implementation, but am not yet familiar enough with the design.
Any insights into why the penalty is happening?
You can see this notebook which demonstrates the slowdown.