I’m investigating a performance issue, so I want to open this thread to record my investigation result as well as to discuss with TVM experts here.
During auto-tuning, I can observe a excellent performance on my android device, e.g. 8ms inference latency when stable(the first several run is a little bit slow).
but when I use Java API to load and run the model according to the example of Android Deploy, performance become much worse, e.g. 50ms inference latency. even when I repeatedly run inference with same input like below, the latency is still not ideal, (about 25-30ms)
for (int j =0; j < 1000; j++) {
runFunc.invoke();
}
Currently I have 2 questions:
- In current android deploy application(in MainActivity.java), following resource will be created and released in every ModelRunAsyncTask, whether they can be cached somewhere and reused for different image frame?
NDArray inputNdArray = NDArray.empty(new long[]{1, IMG_CHANNEL, MODEL_INPUT_SIZE, MODEL_INPUT_SIZE}, new TVMType("float32"));
NDArray outputNdArray = NDArray.empty(new long[]{1, 9}, new TVMType("float32"));
Function setInputFunc = graphRuntimeModule.getFunction("set_input");
Function runFunc = graphRuntimeModule.getFunction("run");
Function getOutputFunc = graphRuntimeModule.getFunction("get_output");
- except this Java API, is there other way for me to run the compiled model for my android device? (e.g. currently I’m reading the code to figure out why RPCGetTimeEvaluator, used at the last step in auto-tuning, is super fast, and how it interacts with the compiled model. I really need to reproduce the measures inference latency of 8ms or a close one on my android application.)