Getting the same latency results with tuned and untuned network and latency results don't match with the rpc set up


#1

If here just after this line (https://github.com/dmlc/tvm/blob/cab5af26877184e93911de31b9abec2be09de49b/tutorials/autotvm/tune_relay_arm.py#L302)
I build the module using relay.build_module.build and export the 3 things the shared lib, the params dictionary and the graph json for c++ deployment to android, it should give me the untuned model right?
And if I don’t do the above and let the code run for tuning then finally I will get 3 outputs with json and params being the same but shared library being different(tuned or optimized).
After deploying both the scenarios on arm_cpu I found the latency of both the cases was same and the result I got(inference) was also same (which is a good thing). But my concern is while testing with the rpc set up and tuning it, after tuning I finally receive the inference latency of 82 ms(through rpc setup), the same should be here also, but here I got fluctuating frequencies between 110-220 ms. Can anybody tell where am I doing worng.
Model used : tflite, mobilenet_v1_224.


#2

This will depend on which ARM CPU you are using, there may be pretuned configurations downloaded from top hub (https://github.com/dmlc/tvm/blob/master/python/tvm/autotvm/tophub.py) which would cause the performance to basically be the same.

How are you doing the time measurement when you get 110-220ms? It is difficult to pinpoint without a code example here.


#3

I won’t be able to share the exact code but, in this link that you provided (https://github.com/dmlc/nnvm/blob/master/docs/how_to/deploy.md)
before calling the run() function I am setting the clock() and after the run funtion calculating the time, this is how I get the time varying from 110-220ms, whereas as mentioned above letting the whole auto-tuning model for 1000 trails gives the final latency of 82ms with rpc setup. I should be getting the same latencies with both the setups if my method is measuring time is correct, right?
I have a snapdragon 820 android device (MSM8996).
And how do I disable the downloading of auto-tuned models from tophub?


#4

There may be discrepancies with the time reported by AutoTVM if you do the timing measurement differently. AutoTVM uses the time evaluator, which will do things like ignore the first run as it may include the cost of JIT compilation for backends like OpenCL.

To prevent any tophub packages from being used, you can delete any files in ~/.tvm and comment out the download section of the tophub module: https://github.com/dmlc/tvm/blob/eae76b3c3b189197f79b79fcccd6a2348640e6a0/python/tvm/autotvm/tophub.py#L121


#5

Thanks for the information @eqy.
I checked the latency results obtained for the same model on arm_nn (libraries provided by arm), and I got more or less same results, the average latency obtained by tuned model of tvm gives 92 ms and latency by arm_nn is 88ms. Seems like the procedure for optimization on arm_nn and tvm is same. You have any thoughts on that?


#6

Depending on the model, the performance can be similar.