Hello There, I’m a Georgia Tech graduate student working in the Synergy Lab. We focus on discrete machine learning accelerators and sometime ago, we released an ML accelerator architecture called MAERI.
We have implemented it in an FPGA and wish to add support for arbitrary CNN models.
I am fascinated with TVM - but am wondering if it supports a particular use case that we need.
In MAERI, a 3x3 * 128x128 convolution requires 3x3 = 9 mults. So if a particular instantiation of MAERI has 64 mults, that means I can support convolutions with 3x3, 4x4, and 5x5 kernels all in parallel.
Is there a way to make TVM aware of this - that is - can I write ordering rules that support this?
Often in CNNs, a particular layer can have multiple outputs. Consider a ML layer that has 4 outputs. The physical constraints our the MAERI accelerator might only be support finishing 3 of those 4 outputs in parallel. This leave 1 output to be computed by itself. So while this 1 output is being computed, the MAERI accelerator can begins evaluating outputs of the next layer(that depend on the 3 outputs we just computed) in parallel.
Given a Keras model, and the physical constraints of MAERI, can I have TVM determine valid ordering of a CNN? I have browsed through the tutorials - but I don’t think the tutorials answered this question.