Hello there. I am looking at grouped convolution, and am incurring a massive performance penalty when using it.

In the Relay interface, there are different implementations for conv layers when `groups==1`

, and when `groups>1`

.

Ideally, we would hope for a speedup 2x if we switch from normal convolutional layer, to a grouped convolution with `groups==2`

. However, on several platforms I have tried, there is a ~4x slowdown.

I have been looking into the tvm implementation, but am not yet familiar enough with the design.

Any insights into why the penalty is happening?

You can see this notebook which demonstrates the slowdown.