OpenCL scheduler: Kernel_vec related question

Hi, @Laurawly I am trying to understand OpenCL’s scheduler. I’ve read OpenCL AutoTVM questions but still cannot figure out a couple of stuff. I’d appreciate some answers:

  1. In nchw scheduler, do you convert the kernel to NCHW16c?
  2. Why is kernel, in kernel_vec operation, divided into blocks by first axis, num_filters and not second one, channel?
  3. What’s the purpose of channels in convolution operation? When could it be useful? (in python/relay/op/nn.py)
  4. How was this if statement created? Are these numbers from a specific iGPU? If yes, which one?
block_w = 1
block_h = 1
if stride_h == 2:
    if num_filter + kernel_h == 515:
        block_h = 4
        block_w = 4
    else:
        block_h = 4
        block_w = 5
elif kernel_h == 3:
    if num_filter == 512:
        block_h = 2
        block_w = 7
    else:
        block_h = 2
        block_w = 14
elif kernel_h == 7 and padding == 3 and stride == 1:
    block_h = 3
    block_w = 4
else:
    block_h = 1
    block_w = 16

I’d really appreciate some answers.
Thanks.

HI @Ajja,

  1. In nchw scheduler, do you convert the kernel to NCHW16c?: I used TVM graph tuner here: https://github.com/dmlc/tvm/blob/master/topi/python/topi/intel_graphics/conv2d.py#L55 to pick the best NCHWxC for me.
  2. In kernel vec the first axis is output channel.
  3. I’m not quite clear of your question. Do you mean the role channel plays in convolution? Like explained here? (http://machinelearninguru.com/computer_vision/basics/convolution/convolution_layer.html)
  4. The statement was created for intel graphics HD 530. But we tested it and it turns out to work well to other intel graphics HD as well.

Thank You, @Laurawly very much for the answer. Based on your post, I got a couple more questions:

  1. I think we were talking about different parts of the code. My concern about NCHW16c was in this part.

I thought that by dividing the out_channel by nv you are trying to convert the data to nchwc format but now I see that you are dividing output channels. Are you creating subgroups in this part? But why are you dividing this in compute, not the scheduler using split method? What difference does it make to divide this there?

  1. I read your comment about alter_layout function and don’t quite understand how it works. It isn’t used in AutoTVM because it is enabled only when opt_level = 3, is it?

So, do you use it to replace conv2d’s compute (e.g.https://github.com/dmlc/tvm/blob/master/topi/python/topi/intel_graphics/conv2d.py#L323 ) with https://github.com/dmlc/tvm/blob/master/topi/python/topi/intel_graphics/conv2d.py#L55 ? If yes, then, how does this alter_layout method exactly work? Does it contain some implicit conversion of the input from certain data layout to nchwc and conversion back to that data layout?

  1. I’m not trying to create subgroups here. It’s better to reference https://github.com/dmlc/tvm/blob/master/topi/python/topi/intel_graphics/conv2d.py#L163 which uses alter layout for graph tuning. The code you referenced was a default one to split the output channel.
  2. Yeah, it’s only enabled with opt_level >= 3. It uses implicit conversion as you define. Such as https://github.com/dmlc/tvm/blob/master/topi/python/topi/intel_graphics/conv2d.py#L75. Then it searches for the best combination. @yzhliu could give more detailed instructions on how this works.
1 Like

Thank You, @Laurawly, for the answer. Could you also explain to me why block_h and block_w is used to add more bottom and right padding?

Is it connected with zero-padding explained here ?