Binarize a NN for raspberry PI

Hello!
I have recently came across TVM as a potential solution for deploying a fast NN to a raspberry PI. However, I am unclear as to how to utilize the optimized bit-reduction system (such as 2 bit binary convs).

I have noted that a PR came in that implements such things but it is not clear to me how to utilize it.

I have successfully ran the autotuner on a RPI using the default configuration but I am not sure how to adapt this process to use fewer bits / more optimization.

Please advise!

Currently there are a few different ways to get to a low precision network. The quantization pass you linked to is usually for “post-training” quantization that will attempt to reduce the precision of a floating point network down to 16 or 8 bits. You can try this out by applying the quantization pass on a floating point model and using AutoTVM to tune the quantized model.

However, below 8 bits, some kind of quantization aware training is usually required. If you already have your own low precision/ultra-low precision model trained, then TVM does have high performance operators for the Raspberry Pi https://github.com/dmlc/tvm/blob/master/topi/python/topi/arm_cpu/bitserial_conv2d.py.
CC @cowanmeg

@eqy Thank you for this information. How does one actually apply this into a network; is there a tutorial I missed which outlines how to incorporate topi into a pretrained neural network, such as in the example, “tune_relay_arm.py”?

I really appreciate your patience as I am quite new to TVM and only understand the system at a very high level.

Right now we only support topi operators for convolution and dense operators, and working towards adding end to end support, so that Relay can use these operators.

Most of the pretrained binary neural networks in other frameworks use custom operators to handle the binary convolutions and dense operators, so deploying them on tvm requires a little bit more engineering work to properly handle the custom operator.

1 Like