[VTA] running inceptionv3 from gluon.model_zoo on VTA

Hi, all.

I’m interested in running a different model on VTA. (not Resnet-18)
So I took the pre-trained inceptionv3 model using mxnet.gluon.model_zoo and ran it on my pynq board.

At first, I met an error as follows:

tvm._ffi.base.TVMError: Except caught from RPC call: [13:43:50] /home/xilinx/tvm/vta/src/runtime.cc:301: Check failed: dram_buffer_ != nullptr

I thought it was a memory error, so I rebuilt Linux kernel for pynq by increasing linux cma memory capacity.

Finally, I got the following results:

// Prediction result:
No handlers could be found for logger “autotvm”
InceptionV3 Prediction #1: tiger cat
#2: Egyptian cat
#3: Pembroke, Pembroke Welsh corgi
#4: tabby, tabby cat
#5: wood rabbit, cottontail, cottontail rabbit
TVM prediction top-1: tiger cat

The result seemed close to the right answer.

However, when I surfed the TVM community for information on running other models on VTA, VTA looks like it only supports quantized models with int8. (maybe Resnet18 that custom quantization passes applied?)

So, I’m confused if the result I carried out is correct.
(Actually, I saw that VTA enhancements in TVM v0.6 Roadmap)
I am also not sure if inceptionv3 model from mxnet.gluon.model_zoo is a quantized model.
Is it possible? And can I suppose I have the right result?
And one more thing, how can I extract a json file like resnet18_qt8.json?

For reference, I attach some part of codes.
Thank you in advance.

//code: (sorry, I don’t know how to attach code T.T)

from mxnet.gluon.model_zoo.vision import get_model
from mxnet.gluon.utils import download
env = vta.get_env()
block = get_model(‘inceptionv3’, pretrained=True)
net, params_i = nnvm.frontend.from_mxnet(block)
net = nnvm.sym.softmax(net)

device = “vta”
ctx = remote.ext_dev(0) if device == “vta” else remote.cpu(0)
image_shape = (3, 299, 299)
batch_size = 1
num_class = 1000
data_shape = (batch_size,) + image_shape
out_shape = (batch_size, num_class)
target = tvm.target.create(“llvm -device={}”.format(device))
target_host = “llvm -mtriple=armv7-none-linux-gnueabihf -mcpu=cortex-a9 -mattr=+neon”
with nnvm.compiler.build_config(opt_level=1):
with vta.build_config():
graph, lib, params = nnvm.compiler.build(
net, target, shape={“data”: data_shape},
params=params_i, target_host=target_host)
assert tvm.module.enabled(“rpc”)

m = graph_runtime.create(graph, rlib, ctx)
m.set_input(**params)
image = Image.open(‘vta/tutorials/cat.jpg’).resize((299, 299))
image = process_image(image)
m.set_input(‘data’, image)
m.run()

Right the VTA example can run a simple resnet-18 out of the box; but the model we use in this example has been massaged (quantization, bitpacking).

As a result it won’t work in a very plug and play fashion if you substitute the model we provide with another model from a model zoo, like inception v3, unless you take the measure to adapt the model as we did for ResNet18.
How much did you have to do to get the model to compile down to VTA? Did you have to apply any 8b quantization to the parameters?

We are in the process of releasing a more “push-button” front end in Relay so we can run off the shelf models on VTA, like inceptionv3 for instance.

Hi, Thank you for your reply.

I just got the inceptionv3 from the model zoo and I just run it. I did not massage like you said.
How can I massage like you did? (quantization, bitpacking)
And will I know when the push button will be released?

Thank you.

Hi Jenny,

Our massaging pass in NNVM relied on hacks to apply quantization, and bit packing; Relay will be the way forward. I’m working on it at the moment, and will update when it’s ready.

Thierry