Working on bitpacked operators would be super interesting on VTA, this is a direction I’m looking into enabling in hardware/software, but a lot of work will need to be done on training/quantization to enable it.
In terms of FPGA coverage, are there low-power FPGAs that @cbalint13 and @kevinyuan would be interested in providing preliminary support to other than the ice40? I think it might be interesting to see if we can instantiate a VTA design on an FPGA ~10x smaller than originally designed for. We could come up with interesting optimizations, or re-organizations.