TVM's get_output function is time-consuming with Mali openCL on RK3399


Today i got an issue of very bad performance for mobilenet_ssd_300 model (no priorbox and detection’s layer). it’s about 1.9s for “run” function. I could not get 230ms as i reported before and i don’t know why
Otherwise, i tried to reproduce your benchmark result and it’s my result:

  • mobilenet: ~80ms
  • resnet18: 0.19822525763333332 s
  • vgg16: 0.98832358445 s

The resnet18 and vgg16 model’s results is a big different with result here
Here is my script. I don’t use RPC.
Is there something wrong with my implementation?


Can you check if you have the current configs for mali from tophub?


@eqy i checkout the lastest code of tvm and i can reproduce mobilenet_ssd_300’s performance as i reported before (250ms)
i found that we don’t have pre-tunned parameters for this model. So the performance will be better if i use autoTVM?




Yes, there are no guarantees about using fallback configs. The performance will very likely be better with AutoTVM.


@merrymercy Is this the case with CPU run as well?


run on CPU should cost “real” running time.



I have RK3399 board and like to reproduce your result of mobilenet-ssd.
Can you show me the steps of doing this and the best time you have got?



Can we reduce time cost by asynchronous getting results back from GPU? Not to wait for all outputs to be ready?