Hello everybody, I have developed an instruction set architecture for my own (for now abstract) deep learning accelerator and now want to try and build a backend, that uses TVM to translate and optimize a model description into my own assembler code.
My problem is: I am not sure, where to start, I read the PR and information about “Bring your Own Codegen”, but am still confused on where to extend TVM.
The acccelerator uses int8 inputs and weights, has dedicated instructions for convolution, relu, matmul, elementwise operations and look-up-table activation function. The data for these operations does need to come from internal SRAM and should be transferred there by the DMA.
Should I just write my own compiler passes to relay and lower it myself into the custom assembler?