What exactly are the requirements of a CNN model built using a frontend (TensorFlow) to be compilable & runnable on the VTA?
My thought right now is that the input & output tensors at each convolution & pooling layer has to be tilable by a 16*16 matrix by default (2^env.BLOCK_OUT). So for example, for an image recognition model the input image has be in dimensions divisible by 16.
Is this correct? It’s just really hard to find the right kernel sizes & strides at each layer that maintains this requirement, so I just want to clarify. Thanks!