Hello all,
After reading about TVM and VTA for quite some while, I decided to do my first real try at using it.
I seem to have overestimated my knowledge and seem to be stuck in a couple of things.
So I would appreciate the help
Background Info
I am trying to do something similar to the VTA examples. So assume that we have an accelerator for CNNs (fyi: the accelerator doesnt exist as anything else as a concept). Like in the VTA example (and most of the TVM literature) we want to fuse the Conv2ds and the activations (which are up to now only ReLus). One thing we want to do differently is to fuse popular pooling patterns (right now only 2x2s ‘max’).
The TVM Schedule
Now, I know TVM’s operator fusion will not do this fusion and therefore I have written per hand one example of a TVM task in which conv2d, relu and pooling are fused in the patter that we expect.
I found it very easy to just compute_at()
at the right levels to match our expected program flow
from __future__ import absolute_import, print_function
import os
import tvm
import topi
#Description of Layer parameters
batch_size=1
in_channel = 16
in_height = 8
in_width = 8
out_channel = 32
kernel_h = 3
kernel_w = 3
pad_h = 1
pad_w =1
stride_h = 1
stride_w = 1
pool_h=2
pool_w = 2
pool_type = 'max'
ofm_h_shape = (in_height + 2 * pad_h - kernel_h) // stride_h + 1
ofm_w_shape = (in_width + 2 * pad_w - kernel_w) // stride_w + 1
#Placeholders for IFM and Weights
ifm_t = tvm.placeholder((batch_size,in_channel,in_height,in_width), name='ifm_t')
kernel_t = tvm.placeholder((out_channel,kernel_h,kernel_w), name='kernel_t')
#Conv2d Op
dy = tvm.reduce_axis((0,kernel_h), name='dy')
dx = tvm.reduce_axis((0,kernel_w), name='dx')
ic = tvm.reduce_axis((0,in_channel), name='ic')
ofm_t = tvm.compute((batch_size,out_channel,ofm_h_shape,ofm_w_shape),
lambda b,co, h, w: tvm.sum(
ifm_t[b,ic, h*stride_h+dy, w*stride_w+dx] *
kernel_t[ic, dy, dx],
axis=[ic, dy, dx]),
name="ofm_t")
#Relu Op
act_t = topi.nn.relu(ofm_t)
#Pooling Op
pool_t = topi.nn.pool(act_t,[pool_h,pool_w],stride=[pool_h,pool_w], padding=[0,0,0,0], pool_type=pool_type )
#Scheduling Rule
sch = tvm.create_schedule(pool_t.op)
act_t_axis = sch[act_t].op.axis
sch[ofm_t].compute_at(sch[act_t], act_t_axis[-1])
pool_t_axis=sch[pool_t].op.axis
sch[act_t].compute_at(sch[pool_t],pool_t_axis[-1] )
print(tvm.lower(sch,[ifm_t, kernel_t, pool_t],simple_mode=True))
The Output
// attr [tensor] storage_scope = "global"
allocate tensor[float32 * 1 * 32 * 4 * 4]
// attr [ofm_t] storage_scope = "global"
allocate ofm_t[float32 * 1 * 1 * 1 * 1]
produce tensor {
for (ax1, 0, 32) {/*Output Channel dimension*/
for (ax2, 0, 4) {/*Output Height AFTER pooling*/
for (ax3, 0, 4) {/*Output Width AFTER pooling*/
produce compute {
for (i2, 0, 2) {/*Intermediate Height, which should be equal to pooling kernel*/
/*Question 5: BEGIN of tensorize() */
for (i3, 0, 2) {/*Intermediate Width, which should be equal to pooling kernel*/
produce ofm_t {
ofm_t[0] = 0.000000f
for (ic, 0, 16) {/*Input Channel*/
/*Question 4: BEGIN of tensorize()*/
for (dy, 0, 3) {/*Conv2D's Kernel Height*/
for (dx, 0, 3) {/*Conv2D's Kernel Width*/
ofm_t[0] = (ofm_t[0] + (ifm_t[((((((((ax2*8) + ax3) + (i2*4))*2) + i3) + (ic*64)) + (dy*8)) + dx)]*kernel_t[((((ic*3) + dy)*3) + dx)])) /*Convolution*/
}
}
/*Question 4: END of tensorize()*/
}
}
compute[(((((((ax1*4) + ax2)*8) + ax3) + (i2*4))*2) + i3)] = max(ofm_t[0], 0.000000f) /*Relu*/
}
}
}
tensor[((((ax1*4) + ax2)*4) + ax3)] = -340282346638528859811704183484516925440.000000f /*Initialization for max pool*/
for (rv, 0, 2) {/*Pooling Height Kernel*/
for (rv, 0, 2) {/*Pooling Width Kernel*/
tensor[((((ax1*4) + ax2)*4) + ax3)] = max(tensor[((((ax1*4) + ax2)*4) + ax3)], compute[(((((((ax1*4) + ax2)*8) + ax3) + (rv*4))*2) + rv)])/*Max Pool*/
}
}
/*Question 5: END of tensorize() */
}
}
}
}
The Questions
-
How can we control the name of the tensors produced by calling topi.nn operators?
I was somewhat confused by the automatic names some of the operators are generating.
In the snippetcompute[] = //the output of Act and I would like it to be called act_t[]
andtensor[]= // the output of Pooling and I would like it to be called pool_t[]
-
Why is the array
compute[]
being ignored in the top part of the pseudo-code where all other arrays are being allocated?
I kind of understand thatifm_t
andkernel_t
are just placeholders and don’t create allocations (is this correct?) butcompute[]
should be a tensor and should require previous allocation. -
Also, I think
compute[]
should be at most of size(1,1,2,2)
I think this isn’t the case here… am I wrong?
(For the following questions I did some first trials and failed at tensorizing, but since it’s my first time I just want to know if its possible and get some hints instead of getting the end solution)
-
If I wanted to tensorize everything between
/*Question 4: BEGIN
and the/*Question 4: END
,
would I have to create an operation which slicesifm_t
such that it has a size of(1,1,3,3)
and also thekernel_t
to have size of(1,3,3)
or is this not necessary?
Also notice that the initializationofm_t[0] = 0.000000f
is outside of the “tensorization region”. How to handle this? (i.e. I dont want it inside the tensor intrinsic) -
Is it possible to tensorize the code between
/*Question 5: BEGIN
and the/*Question 5: END
? I ask because most of thetensorize
examples I remember seeing only have one computing rule, which is no longer the case now. -
Also what are known limitations of
tensorize
?
More specifically, are there examples of whentensorize
cannot be used but a direct manipulation of the AST is necessary? (I guesstensorize
is already an AST manipulation, but what I mean is the developer needing to manually design the AST manipulation)
Thanks