Will "global" cache_read be precomputed for constant input?

Rasterer · November 23, 2018, 2:35am

For example, in the conv2d cases, I’d like to cache_read kernel weights to another global buffer and do some tiling there before calculation. Will cache_read be precomputed in this case?

thierry · November 23, 2018, 2:08am

Is this in the context of VTA? or are you targeting another backend?

Rasterer · November 23, 2018, 2:20am

I am now targeting arm_cpu backend

thierry · November 23, 2018, 8:42am

I’m not sure I understand your question: could you be more specific about what you mean by pre-computing the cache_read?

Rasterer · November 23, 2018, 9:38am

Please see the usage in https://github.com/dmlc/tvm/blob/master/topi/python/topi/arm_cpu/depthwise_conv2d.py#L62
I am just curious if the kernel packing can be computed in advance(should be possible because kernel weights are constant) or must be computed at runtime?

merrymercy · November 25, 2018, 9:13am

It won’t be computed in advance. Because it is a single TVM op, while precompute is a NNVM pass which works on nnvm symbol.

So what we do is to separate this conv into two nnvm symbols, one does convolution and the other does weight transform. Then the weight transform symbol can be pre-computed.

We do this by registering alter_op_layout in nnvm. See relate code for conv2d

github.com

dmlc/tvm/blob/9473dca266e307cf1f9faece219af111686ca946/topi/python/topi/arm_cpu/conv2d.py#L526


    if 'winograd_conv2d_output' in op.tag:
        output = op.output(0)
        _schedule_winograd(cfg, s, output, outs[0])


traverse_inline(s, outs[0].op, _callback)
return s




##### REGISTER ALTER OP LAYOUT #####
@conv2d_alter_layout.register(["arm_cpu"])
def _alter_conv2d_layout_arm(attrs, inputs, tinfos):
"""Alter op layout for pre-computing kernel transformation"""
import nnvm.symbol as sym
copy_inputs = [s for s in inputs]


new_attrs = {k: attrs[k] for k in attrs.keys()}


dilation = attrs.get_int_tuple("dilation")
strides = attrs.get_int_tuple("strides")
padding = attrs.get_int_tuple("padding")
groups = attrs.get_int('groups')

Rasterer · November 26, 2018, 5:40am

@merrymercy The layout altering will not happen in autotvm path, so need use debug_skip_region to exclude it from tuning time measurement. Is my understanding correct?

merrymercy · November 26, 2018, 3:55pm

You are right.

Rasterer · November 27, 2018, 5:25am

Great, thanks a lot for clarification.