How do I build a quantized model like this ResNet tutorial from other deep learning frameworks (TensorFlow, PyTorch, etc.) and run it on the VTA?
I saw in this resnet18_qt8.json
file that there are many operations in the ResNet tutorial:
{
"op": "cast",
"name": "resnetv20_conv0_weight_quantized_cast",
"attrs": {"dtype": "int32"},
"inputs": [[22, 0, 0]]
},
{
"op": "conv2d",
"name": "conv2d0",
"attrs": {
"channels": "64",
"dilation": "(1, 1)",
"groups": "1",
"kernel_size": "[7, 7]",
"layout": "NCHW",
"out_dtype": "int32",
"padding": "(3, 3)",
"strides": "(2, 2)",
"use_bias": "False"
},
"inputs": [[17, 0, 0], [23, 0, 0]]
},
But how is this built? Because when I build the model from other frameworks, the graph consists of nodes/operations that are not support by the TVM compiler, such as QuantizeV2, QuantizedConv2D, and so on. So to summarize my questions:
Questions
- What is the true, intended way or workflow of running a quantized model on the VTA?
- How can we build a VTA quantized model like in the ResNet tutorial?